Giter Club home page Giter Club logo

covid-alert-server's Introduction

La version française suit.

COVID Alert Diagnosis Server

COVID Alert is now retired: For more information, visit the Government of Canada COVID Alert home page.

Adapted from https://github.com/CovidShield/server

This repository implements a diagnosis server to use as a server for Apple/Google's Exposure Notification framework, informed by the guidance provided by Canada's Privacy Commissioners.

The choices made in implementation are meant to maximize privacy, security, and performance. No personally-identifiable information is ever stored, and nothing other than IP address is available to the server. No data at all is retained past 21 days. This server is designed to handle use by up to 38 million Canadians, though it can be scaled to any population size.

In this document:

Overview

Apple/Google's Exposure Notification specifications provide important information to contextualize the rest of this document.

There are two fundamental operations conceptually:

  • Retrieving diagnosis keys: retrieving a list of all keys uploaded by other users; and
  • Submitting diagnosis keys: sharing keys returned from the EN framework with the server.

These two operations are implemented as two separate servers (key-submission and key-retrieval) generated from this codebase, and can be deployed independently as long as they share a database. It is also possible to deploy any number of configurations for each of these components, connected to the same database, though there would be little value in deploying multiple configurations of key-retrieval.

For a more technical overview of the codebase, especially of the protocol and database schema, see this video.

Retrieving diagnosis keys

When diagnosis keys are uploaded, the key-submission server stores the data defined and required by the Exposure Notification API in addition to the time at which the data was received by the server. This submission timestamp is rounded to the nearest hour for privacy preservation (to prevent correlation of multiple keys to the same user).

The hour of submission is used to group keys into buckets, in order to prevent clients (COVID Alert mobile app) from having to download a given set of key data multiple times in order to repeatedly check for exposure.

The published diagnosis keys are fetched—with some best-effort authentication—from a Content Distribution Network (CDN), backed by key-retrieval. This allows a functionally-arbitrary number of concurrent users.

Retrieving Exposure Configuration

Exposure Configuration, used to determine the risk of a given exposure, is also retrieved from the key-retrieval server. A JSON document describing the current exposure configuration for a given region is available at the path /exposure-configuration/<region>.json, e.g. for Ontario (region ON):

$ curl https://retrieval.covidshield.app/exposure-configuration/ON.json
{"minimumRiskScore":0,"attenuationLevelValues":[1,2,3,4,5,6,7,8],"attenuationWeight":50,"daysSinceLastExposureLevelValues":[1,2,3,4,5,6,7,8],"daysSinceLastExposureWeight":50,"durationLevelValues":[1,2,3,4,5,6,7,8],"durationWeight":50,"transmissionRiskLevelValues":[1,2,3,4,5,6,7,8],"transmissionRiskWeight":50}

Submitting diagnosis keys

In brief, upon receiving a positive diagnosis, a health care professional will generate a One Time Code through a web application frontend (COVID Alert Portal), which communicates with key-submission. This code is sent to the patient, who enters the code into their COVID Alert mobile app. This code is used to authenticate the Application (once) to the diagnosis server. Encryption keypairs are exchanged by the Application and the key-submission server to be stored for fourteen days, and the One Time Code is immediately purged from the database.

These keypairs are used to encrypt and authorize Diagnosis Key uploads for the next fourteen days, after which they are purged from the database.

The encryption scheme employed for key upload is NaCl Box (a public-key encryption scheme using Curve25519, XSalsa20, and Poly1305). This is widely regarded as an exceedingly secure implementation of Elliptic-Curve cryptography.

Data usage

The Diagnosis Key retrieval protocol used in COVID Alert was designed to restrict the data transfer to a minimum. With large numbers of keys and assuming the client fetches using compression, there is minimal protocol overhead on top of the key data size of 16 bytes.

In all examples below:

  • Each case may generate up to 28 keys.
  • Keys are valid and distributed for 14 days.
  • Each key entails just under 18 bytes of data transfer when using compression.
  • Key metadata and protocol overhead should in reality be minimal, but:
  • Assume 50% higher numbers than you see below to be on the safe side. This README will be updated soon with more accurate real-world data sizes.

Data below is current at May 12, 2020. For each case, we assume the example daily new cases is a steady daily recurrence.

Deployed only to province of Ontario

There were 350 new cases in Ontario on May 10, 2020. 350 * 28 * 18 = 170kB per day, thus, deploying to the province of Ontario at current infection rates would cause 7.1kB of download each hour.

Deployed to Canada

There were 1100 new cases in Canada on May 10, 2020. 1100 * 28 * 18 = 540kB per day, thus, deploying to Canada at current infection rates would cause 23kB of download each hour.

Deployed to entire United States of America

There were 18,000 new cases in America on May 10, 2020. 18,000 * 28 * 18 = 8.9MB per day, thus, deploying to the all of America at current infection rates would cause: 370kB of download each hour.

Deployed to entire world

If COVID Alert were deployed for the entire world, we would be inclined to use the "regions" built into the protocol to implement key namespacing, in order to not serve up the entire set of global diagnosis keys to each and every person in the world, but let's work through the number in the case that we wouldn't:

There were 74,000 new cases globally on May 10, 2020. 74,000 * 28 * 16 = 36MB per day, thus, deploying to the entire world at current infection rates would cause: 1.5MB of download each hour.

Generating one-time codes

We use a one-time code generation scheme that allows authenticated case workers to issue codes, which are to be passed to patients with positive diagnoses via whatever communication channel is convenient.

This depends on a separate service, holding credentials to talk to this (key-submission) server. We have a sample implementation we will open source soon, but we anticipate that health authorities will prefer to integrate this feature into their existing systems. The integration is extremely straightforward, and we have minimal examples in several languages. Most minimally:

curl -XPOST -H "Authorization: Bearer $token" "https://submission.covidshield.app/new-key-claim"

Protocol documentation

For a more in-depth description of the protocol, please see the "proto" subdirectory of this repo.

Deployment notes

  • key-submission depends on being deployed behind a firewall (e.g. AWS WAF), aggressively throttling users with 400 and 401 responses.

  • key-retrieval assumes it will be deployed behind a caching reverse proxy.

Platforms

We hope to provide reference implementations on AWS, GCP, and Azure via Hashicorp Terraform.

Amazon AWS

Kubernetes

Metrics and Tracing

COVID Alert uses OpenTelemetry to configure the metrics and tracing for the server, both the key retrieval and key submission.

Metrics

Currently, the following options are supported for enabling Metrics:

  • standard output
  • prometheus

Metrics can be enabled by setting the METRIC_PROVIDER variable to stdout, pretty, or prometheus.

Both stdout and pretty will send metrics output to stdout but differ in their formatting. stdout will print the metrics as JSON on a single line whereas pretty will format the JSON in a human-readable way, split across multiple lines.

If you want to use Prometheus, please see the additional configuration requirements below.

Server Events

The server tracks in aggregate by day or hour and by originator (Bearer Token) the following events.

OTKGenerated

This tracks the number of One Time Keys generated by calling the /new-key-claim endpoint.

OTKClaimed

This tracks the number of One Time Keys generated by calling the /claim-key endpoint. This is done when a One Time Key is entered by a citizen into their phone.

OTKUnclaimed

This tracks the number of One Time Keys that are unclaimed and are younger than the config.AppConstants.OneTimeCodeExpiryInMinutes configuration value.

OTKExpired

This tracks the number of claimed One Time Keys that have expired in the database a key expires when it has been claimed and is older than the config.AppConstants.EncryptionKeyValidityDays.

OTKExhausted

This tracks the number of claimed One Time Keys that have 0 for the remaining_keys field in the encryption_keys table.

OTKRegenerated

This tracks the number of times the /new-key-claim endpoint is called with an existing hashID. When this occurs the existing One Time Key is deleted and a new one is generated in it's place.

OTKExpiredNoUploads

The tracks the number of One Time Keys that have been claimed and expired with no uploads of Temporary Exposure Keys

OTKDurations

This tracks how long One Time Keys are unclaimed for to rounded up to the nearest hour.

Prometheus

In order to use Prometheus as a metrics solution, you'll need to be running it in your environment.

You can follow the instructions here for running Prometheus.

You will need to edit the configuration file, prometheus.yml to add an additional target so it actually polls the metrics coming from the COVID Alert server:

...
    static_configs:
    - targets: ['localhost:9090', 'localhost:2222']

Tracing

Currently, the following options are supported for enabling Tracing:

  • standard output

Tracing can be enabled by setting the TRACER_PROVIDER variable to stdout or pretty.

Both stdout and pretty will send trace output to stdout but differ in their formatting. stdout will print the trace as JSON on a single line whereas pretty will format the JSON in a human-readable way, split across multiple lines.

Note that logs are emitted to stderr, so with stdout mode, logs will be on stderr and metrics will be on stdout.

Contributing

See the Contributing Guidelines.

Who Built COVID Alert?

COVID Alert was originally developed by volunteers at Shopify. It was released free of charge under a flexible open-source license.

This repository is being developed by the Canadian Digital Service. We can be reached at [email protected].


Serveur de diagnostic COVID Alert

Alerte COVID a été mis hors service : Pour en savoir davantage, visitez la page d'accueil d’Alerte COVID du gouvernement du Canada.

Adapté à partir de https://github.com/CovidShield/server (voir les modifications)

Ce dépôt implémente un serveur de diagnostic à utiliser comme serveur pour le cadriciel de notification d’exposition d’Apple et de Google, suivant les directives fournies par les commissaires à la protection de la vie privée du Canada.

Les choix faits dans l’implémentation visent à maximiser la confidentialité, la sécurité et le rendement. Les renseignements identificatoires ne sont jamais stockés, et il n’y a que l’adresse IP qui est accessible au serveur. Aucune donnée n’est conservée après 21 jours. Ce serveur est conçu pour gérer jusqu’à 38 millions d’utilisateurs canadiens, même s’il peut être étendu à n’importe quelle taille de population.

Dans la présente documentation :

Aperçu

Les spécifications de la notification d’exposition d’Apple et de Google fournissent des renseignements importants pour contextualiser le reste de ce document.

Il y a deux opérations fondamentales sur le plan conceptuel :

  • Récupération des clés de diagnostic : récupération d’une liste de toutes les clés téléversées par d’autres utilisateurs;
  • Envoi des clés de diagnostic : partage des clés renvoyées par le cadriciel de notification d’exposition avec le serveur.

Ces deux opérations sont implémentées en tant que deux serveurs distincts (key-submission et key-retrieval) générés à partir de cette base de code, et peuvent être déployées indépendamment tant qu’elles partagent une base de données. Il est également possible de déployer n’importe quel nombre de configurations pour chacun de ces composants, connectés à la même base de données, même s’il y aurait peu d’utilité à déployer plusieurs configurations de key-retrieval.

Pour une vue d’ensemble technique du code de base, particulièrement du protocole et du schéma de base de données, voir cette vidéo.

Récupération des clés de diagnostic

Au moment du téléversement des clés de diagnostic, le serveur key-submission stocke les données définies et requises par l’interface de programmation d’applications (API) de notification d’exposition en plus de la date à laquelle les données ont été reçues par le serveur. L’horodatage de cet envoi est arrondi à l’heure la plus proche pour la protection de la vie privée (pour empêcher la corrélation de plusieurs clés avec le même utilisateur).

L’heure d’envoi est utilisée pour regrouper les clés en compartiments, afin d’empêcher que les clients (l’application mobile COVID Alert) aient à télécharger un certain ensemble de données de clés plusieurs fois pour pouvoir vérifier l’exposition de manière répétée.

Les clés de diagnostic publiées sont extraites (avec une authentification optimisée) à partir d’un réseau de distribution du contenu (RDC), soutenu par key-retrieval. Cela permet un nombre fonctionnellement arbitraire d’utilisateurs simultanés.

Récupération de la configuration de l’exposition

La configuration de l’exposition, utilisée pour déterminer le risque d’une exposition donnée, est également récupérée sur le serveur key-retrieval. Un document JSON décrivant la configuration d’exposition actuelle pour une région donnée est disponible par le chemin /exposure-configuration/<region>.json, par exemple pour l’Ontario (région ON) :

$ curl https://retrieval.covidshield.app/exposure-configuration/ON.json
{"minimumRiskScore":0,"attenuationLevelValues":[1,2,3,4,5,6,7,8],"attenuationWeight":50,"daysSinceLastExposureLevelValues":[1,2,3,4,5,6,7,8],"daysSinceLastExposureWeight":50,"durationLevelValues":[1,2,3,4,5,6,7,8],"durationWeight":50,"transmissionRiskLevelValues":[1,2,3,4,5,6,7,8],"transmissionRiskWeight":50}

Envoyer les clés de diagnostic

En bref, lorsque qu’un diagnostic positif est établi, le professionnel de la santé générera un code à usage unique avec une application Web frontale (COVID Alert Portal) qui communique avec key-submission. Ce code est envoyé au patient, qui entre le code dans son application mobile COVID Alert. Ce code est utilisé pour authentifier l’application (une fois) vis-à-vis le serveur de diagnostic. Les paires de clés de chiffrement sont échangées par l’application et le serveur key-submission et sont stockée pendant quatorze jours, et la base de données est immédiatement purgée du code à usage unique.

Ces paires de clés sont utilisées pour chiffrer et autoriser les téléversements de clé de diagnostic pendant les quatorze jours qui suivent, après quoi elles sont enlevées de la base de données.

Le schéma de chiffrement utilisé pour le téléchargement de clés est NaCl Box (un schéma de chiffrement de clé publique utilisant Curve25519, XSalsa20 et Poly1305). Il s’agit d’une implémentation considérée extrêmement sécuritaire de la cryptographie à courbe elliptique.

Utilisation des données

Le protocole de récupération des clés de diagnostic utilisé dans COVID Alert a été conçu pour limiter le transfert de données à un minimum. Considérant le grand nombre de clés, et en supposant que le client les extraie en utilisant la compression, il y a un surdébit de protocole minimal en plus de la taille des données de clé de 16 octets.

Dans tous les exemples ci-dessous :

  • Chaque cas peut générer jusqu’à 28 clés.
  • Les clés sont valides et distribuées pendant 14 jours.
  • Chaque clé implique un peu moins de 18 octets de transfert de données pendant l’utilisation de la compression.
  • Les métadonnées et le surdébit de protocole des clés devraient en réalité être minimes, mais :
  • Supposez que les nombres sont 50 % plus élevés que ce qui se trouve ci-dessous pour plus de sûreté. Ce fichier Readme sera mis à jour bientôt avec des tailles de données réelles plus précises.

Les données ci-dessous datent du 12 mai 2020. Pour chaque cas, nous supposons que les exemples de nouveaux cas recensés sont une récurrence quotidienne constante.

Déployé uniquement dans la province d’Ontario

Il y a eu 350 nouveaux cas en Ontario le 10 mai 2020 : 350 * 28 * 18 = 170 ko par jour. Ainsi, un déploiement dans la province de l’Ontario au taux d’infection actuel engendrerait 7,1 ko de téléchargement par heure.

Déployé au Canada

Le 10 mai 2020, il y a eu 1100 nouveaux cas au Canada : 1100 * 28 * 18 = 540 ko par jour. Ainsi, le déploiement au Canada au taux d’infection actuel entraînerait 23 ko de téléchargement par heure.

Déployé dans l’ensemble des États-Unis d’Amérique

Il y a eu 18 000 nouveaux cas aux États-Unis le 10 mai 2020 : 18 000 * 28 * 18 = 8,9 mégaoctets [Mo] par jour. Ainsi, le déploiement dans l’ensemble des États-Unis au taux d’infection actuel entraînerait 370 ko de téléchargement par heure.

Déployé dans le monde entier

Si COVID Alert était déployé dans le monde entier, nous serions enclins à utiliser les « régions » conçues dans le protocole pour établir des espaces de noms pour les clés, afin de ne pas desservir l’ensemble des clés de diagnostic mondiales pour chaque personne dans le monde. Passons cependant en revue les chiffres au cas où nous ne le ferions pas :

Le 10 mai 2020, il y a eu 74 000 nouveaux cas dans le monde : 74 000 * 28 * 16 = 36 Mo par jour. Ainsi, le déploiement dans le monde entier au taux d’infection actuel entraînerait 1,5 Mo de téléchargement par heure.

Génération de codes à usage unique

Nous utilisons un système de génération de codes à usage unique qui permet aux professionnels authentifiés d’émettre des codes. Ces codes doivent être transmis aux patients présentant un diagnostic positif par l’intermédiaire de n’importe quel canal de communication pratique.

Cette démarche dépend d’un service différent, qui détient des justificatifs pour communiquer avec ce serveur (key-submission). Nous avons une implémentation à titre d’exemple dont le code source sera bientôt ouvert. Cependant, nous nous attendons à ce que les autorités sanitaires préfèrent intégrer cette fonctionnalité dans leurs systèmes existants. L’intégration est extrêmement simple, et on dispose d’exemples en plusieurs languages. Au minimum :

curl -XPOST -H "Authorization: Bearer $token" "https://submission.covidshield.app/new-key-claim"

Documentation du protocole

Pour une description détaillée du protocole, veuillez consulter le sous-répertoire « proto » de ce dépôt.

Remarques de déploiement

  • key-submission dépend du déploiement derrière un pare-feu (par exemple AWS WAF, ce qui freine les utilisateurs de manière agressive par des réponses 400 et 401.

  • key-retrieval suppose un déploiement derrière un proxy inverse de mise en cache.

Plateformes

Nous espérons fournir des implémentations de référence sur AWS, GCP et Azure par Hashicorp Terraform.

Amazon AWS

Kubernetes

Indicateurs et traçage

COVID Alert utilise OpenTelemetry pour configurer les indicateurs et le traçage du serveur, à la fois pour la récupération et l’envoi des clés.

Indicateurs

Actuellement, les options suivantes sont prises en charge pour activer les indicateurs :

  • données de sortie standard
  • prometheus

Les indicateurs peuvent être activés en définissant la variable METRIC_PROVIDER sur stdout, pretty, ou prometheus.

Aussi bien stdout que pretty enverront les indicateurs de sortie à stdout, mais leur mise en forme diffère. stdoutimprimera les indicateurs en tant que JSON sur une seule ligne, tandis quepretty` formatera le JSON de manière lisible pour les humains, avec une séparation sur plusieurs lignes.

Si vous voulez utiliser Prometheus, veuillez consulter les exigences de configuration supplémentaires ci-dessous. Événements du serveur Le serveur fait le suivi des événements suivants de façon agrégée par jour ou par heure, et par desserveur (Bearer Token).

OTKGenerated

Assure le suivi du nombre de clés à usage unique (OTK) générées lorsque le point d’extrémité /new-key-claim est appelé.

OTKClaimed

Assure le suivi du nombre de clés à usage unique (OTK) générées lorsque le point d’extrémité /claim-key est appelé. Événement réalisé lorsqu’une clé à usage unique est entrée dans l’application par un citoyen.

OTKUnclaimed

Assure le suivi du nombre de clés à usage unique (OTK) qui n’ont pas été réclamées et dont le délai est en deçà de la valeur de configuration config.AppConstants.OneTimeCodeExpiryInMinutes.

OTKExpired

Assure le suivi du nombre de clés à usage unique (OTK) qui ont expiré dans la base de données. Une clé expire lorsqu’elle a été réclamée mais que son délai dépasse config.AppConstants.EncryptionKeyValidityDays.

OTKExhausted

Suivi du nombre de clés à usage unique (OTK) dont le champ remaining_keys est 0 dans le tableau encryption_keys.

OTKRegenerated

Suivi du nombre de fois où le point d’extrémité /new-key-claim est appelé avec un hashID existant. Lorsque cela se produit, la clé à usage unique existante est supprimée, et une nouvelle clé à usage unique est générée.

OTKExpiredNoUploads

Suivi du nombre de clés à usage unique (OTK) qui ont été réclamées et qui ont expiré sans que les clés d’exposition temporaires (temporary exposure keys) ne soient téléversées.

OTKDurations

Suivi de la durée pendant laquelle les clés à usage unique (OTK) sont non réclamées. Valeur en nombre d’heures, arrondie à la hausse.

Prometheus

Pour utiliser Prometheus comme solution d’indicateurs, vous devez l’exécuter dans votre environnement.

Vous pouvez suivre les instructions ici pour exécuter Prometheus.

Vous devrez éditer le fichier de configuration prometheus.yml pour ajouter une cible supplémentaire afin qu’il interroge réellement les indicateurs provenant du serveur COVID Alert :

...
    static_configs:
    - targets: ['localhost:9090', 'localhost:2222']

Traçage

Actuellement, les options suivantes sont prises en charge pour activer le traçage :

  • données de sortie standard

Le traçage peut être activé en définissant la variable TRACER_PROVIDER sur stdout ou pretty.

Aussi bien stdout que pretty enverront le traçage de sortie à stdout, mais leur mise en forme diffère. stdout imprimera le traçage en tant que JSON sur une seule ligne, tandis que pretty formatera le JSON de manière lisible pour les humains, avec une séparation sur plusieurs lignes.

Notez que les journaux sont émis en mode stderr, de sorte qu’avec le mode stdout, les journaux seront en mode stderr et les indicateurs seront en mode stdout.

Contribution

Consultez les Directives de contribution.

Qui a conçu COVID Alert?

COVID Alert a été développé à l’origine par des bénévoles de Shopify. Il a été diffusé gratuitement en vertu d’une licence ouverte flexible.

Ce dépôt est maintenu par le Service numérique canadien. Vous pouvez nous joindre à [email protected].

covid-alert-server's People

Contributors

alyssarosenzweig avatar burke avatar calvinrodo avatar chemidy avatar dsamojlenko avatar epk avatar firehawk12 avatar genevieveluyt avatar ginja avatar henrytao-me avatar honkfestival avatar jeffinwithya avatar jeffmaher avatar jsoref avatar kpeatt avatar manmeet-k avatar maxneuvians avatar natalysheinin avatar obrien-j avatar paarthmadan avatar patheard avatar pgrimaud avatar sainisimmi avatar sboots avatar simmisaini avatar smcmurtry avatar stonith avatar whytoe avatar wmoussa-gc avatar zqureshi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

covid-alert-server's Issues

Add a private ECR container registry

Add a Terraform script to create a new ECR container registry.

The upstream Github Actions reference hard-coded public Docker Hub repositories. PR #9 moves these settings into secrets, and also adds a registry url so we can use a custom private repo, at least until we're able to go public with this work.

SRV-H-1.8: Start up migrations

Finding:
Application performs schema updates on start-up which grants the application more privileges than strictly needed

Recommendation:
Deployment should allow relegation of schema changes (migrator functions) to a separate task using a dedicated database role to perform DDL. Long-running application processes should allow operating in a mode that only requires DML

SRV-H-1.10: Cloudfront methods

Finding:
CloudFront behavior for Retrieval service permits unused HTTP methods

Recommendation:
CF behavior should be configured to only allow GETand HEAD, as other methods are not used

Switch MySQL -> Aurora

We are certain that for us this app will live in AWS. The high availability nature of Aurora, and the extensive support offered by AWS make this a reasonable alternative to MySQL.

SRV-H-1.1: Terraform bootstrap S3 bucket

Finding:
Lack of automation for creating S3 bucket for Terraform state could result in leaking secrets.

Recommendation:
Include Terraform bootstrapping to correctly provision a non-public server-side encrypted S3 bucket for remote state

SRV-H-1.14: Terraform secrets

Finding:
Terraform destroy and recreate will fail due to 30-day recovery window of secrets in Secrets Manager -secrets cannot be created on second pass due to scheduled deletion

Recommendation:
Terraform config could have an option to set recovery_window_in_days=0 on secrets which could be used in test environments

VA78 - HealthCare API Key Alarm

401 on OTC API Key failure - at a certain threshold should trigger alarm / alert through existing alerting streams

Decision Needed: what is the appropriate threshold
Note: IP address should not be factor to protect from those who use IP hopping

This came from the vulnerability analysis; originally Loudmouth Recommendation 2 but re-solutioned to make sense for application

SRV-H-1.4: ECS to RDS traffic

Finding:
Communication between ECS services and RDS is unencrypted

Recommendation:
Update server code to enable TLS to RDS

VA77 - Make HealthCare API Error Message Generic

Authentication weakness
When a health care provider attempts to generate a new one-time code, to be provided to a patient for upload of exposure keys, they authenticate to the service with a hexidecimal key in an HTTP header. This single factor key could be brute forced by an attacker. Additionally, the server provides a different error message for an invalid key as opposed to other errors, so an attacker will get feedback to determine when a valid key is found.

REC-1 GENERIC ERROR MESSAGES: The service should return a generic error message for any server error, to avoid giving an attacker too much information about the source of the error.

This came from the vulnerability analysis; originally Loudmouth Recommendation 1

Make terraform script more generic

Currently some terraform variables are hard coded and should be generic:

  • route53_zone_name needs to be configurable
  • Secret key TF names are not unique between deploys. This means that if you tear down the TF, because secrets take days to delete, you can't redeploy the terraform right away. Ex fix key-retrieval-env-hmac-key -> key-retrieval-env-hmac-key-${random_string.random.result}
  • TF backend bucket name needs to be configurable terraform { backend "s3" { bucket = ""
  • Github provider needs to be generic: provider "github" { organization = "CovidShield"

Flakey DB connection and time based failing test

Every 1/3 test runs on CI the ruby dependencies will not install the protobuff dependencies:

Bundle complete! 8 Gemfile dependencies, 12 gems now installed.
Use `bundle info [gemname]` to see where a bundled gem is installed.
          \x1b[1;34mprotoc \x1b[0;1m(go)\x1b[0m  pkg/proto/covidshield/proto.pb.go
/bin/sh: 1: protoc: not found
Makefile:48: recipe for target 'pkg/proto/covidshield/proto.pb.go' failed
make: *** [pkg/proto/covidshield/proto.pb.go] Error 127

Re-running fixes this, but it should never happen.

SRV-H-2.2: Image vulnerabilities

Findings:
Docker Hub images are implicitly trusted

Recommendation:
Deployment pipeline should codify a process to scan container images and gate deployment based on appropriate vulnerability thresholds

SRV-H-1.2: Application metrics

Finding:
Server processes do not present application metrics

Recommendation:
Server processes should expose metrics necessary to assess internal app level health, such as Go runtime metrics, counters for different failure conditions (e.g. various modes of auth failures), etc

[For Discussion] Lack of CORS implementation

"There are comments to this in the code, but the CORS headers are hard coded as * for now. I would prefer to see that set from a config file or environment variable."

https://github.com/cds-snc/covid-shield-server/blob/d7efce28ea20ce1217a7e67adf0f948461c2ba62/pkg/server/keyclaim.go#L40

Current implementation of CORS is stubbed out with a TODO. Adding this issue to ensure we close this out in accordance with best practices, and also drop in observatory scan results into here if so desired.

SRV-M-1: Hardcoded configs

Findings:
Several values which should be configurable are hardcoded, e.g. maxDiagnosisKeyRetentionDays, initialRemainingKeys, encryptionKeyValidityDays, oneTimeCodeExpiryInMinutes

Recommendation:
Expose these and other hardcoded values as configurable properties

SRV-H-2.3: Docker registry

Findings:
ECS pulls images directly from Docker Hub

Recommendations:
Approved/cleared images should be propagated to an AWS ECR repository which serves ECS rather than Docker Hub

SRV-H-1.13: Retrieval WAF

Finding:
Retrieval service lacks WAF

Recommendation:
Retrieval service only presents GET /retrieve so at the moment a WAF isn't very important, but for parity with the Submission service and for future-proofing, attaching a WAF to the Retrieval service ALB (or perhaps the CloudFront Distribution) could be worthwhile

SRV-H-2.4: Terraform user

Findings:
Terraform uses AWS access keys for account access

Recommendations:
Use a SAML-compliant IdP for federated access

SRV-H-2.1: Alarms

Finding:
Deployment lacks alarming

Recommendation:
Appropriate thresholds should be placed on KPIs for alarming

SRV-H-1.6: VPC flow logging

Finding:
VPC lacks flow logging

Recommendation:
Enable VPC flow logging to an encrypted target (either non-public S3 bucket with server-side encryption or encrypted CloudWatch Log group)

SRV-H-2.9: Terraform locking

Finding:
Terraform S3 backend does not use Dynamo for locking

Recommendation:
Include provisioning of Dynamo table in Terraform bootstrapping (previous recommendation) and configure backend to use

[For discussion] Certificate Pinning as a potential threat mitigation.

"6) there does not appear to be any configuration to allow for certificate pinning. If a user is connected to an untrusted mobile hotspot they may be coerced into communicating with a rogue Covid shield server."

As raised by external threat audit, lack of certificate pinning in the app was raised a potential concern.

I'm not entirely up-to-speed with the complexity this may add into the deployment/release process, but before going down this route, I'd like to have a well described threat model in place.

I don't see the initial example above as a valid attack vector, in that regardless of the 'sketchiness' of an untrusted hotspot, the browser will still only trust Certificates from a set of trusted Certificate Authorities, none of which is the mobile hot-spot.

SRV-H-1.9: NACL is permissive

Finding:
Default NACL is permissive

Recommendation:
As part of a defense-in-depth strategy, NACLs should minimally be created to block inbound ssh (and rdesktop for good measure) from the Internet (AWS Session Manager should be used for system access)

SRV-H-2.5: NAT Gateway

Findings:
NAT gateway lacks cross-AZ redundancy

Recommendations:
Deploy an additional NAT gateway in another availability zone to handle AZ failure mode

SRV-H-2.8: Terraform state management

Finding:
Terraform state is monolithic which results in a larger failure blast radius and more difficulty in manually reconciling state inconsistencies

Recommendation:
Split resources (either individually or groups of tightly-related resources) out into separate Terraform configs, managed by independent state, similar to the separation that exists between Server and Portal in the reference implementation. Terragrunt may be used to abstract and simplify this process

[For Discussion] Logical separation of claim-key and new-key-claim endpoints.

"The new-key-claim and claim-key enpoints are used by two different sets of users (i.e. health-care workers, vs app users). I would prefer to see this functionality split so that new-key-claim can be implemented on a server tightly controlled by health care, and claim-key can be implemented on a public server"

From external threat analysis, dropping this in here for discussion as to the most effective and simple way of segmenting these and scaling them appropriately.

Enable Snyk

Snyk should be enabled with notifications to appropriate team members.

SRV-H-1.12: Portal security group

Finding:
Portal and API server ALBs share the same security group, with API Server Terraform defining egress rules for Portal

Recommendation:
Remove Portal-related rules from API server terraform, and create separate a Portal-specific security group in Portal terraform

SRV-H-1.15: Terraform ACM

Finding:
ACM certificates will be stuck pending without manual operator intervention to set up DNS zone delegation after Route53 Hosted Zone is created

Recommendation:
As it's unclear how this should work without manual intervention, perhaps a note in the README to warn about this action being necessary during deployment. Alternatively, allow use of a pre-created hosted zone.

Bootstrap Terraform deployment for Core services

Enable a Terraform deployment with two deployments, categorized some examples below.

This ensures continuity and any changes to the static resources are done methodically.

  1. Ephemeral Resources
  • ECS
  • CloudWatch
  1. Static resources
  • KMS
  • VPC
  • S3
  • Database

@maxneuvians @Ginja - WDYT?

Risk: if something goes wrong in Terraform there could be greater impact. This is blast radius containment.

SRV-H-2.7: KEY_CLAIM_TOKEN configuration

Finding:
Key Claim Tokens are fixed (via KEY_CLAIM_TOKEN environment variable) and so provisioning tokens for new healthcare professionals or rotating existing tokens requires a service restart.

Recommendation:
Allow tokens to be dynamically configurable

Expose correct ports on devContainer and set DATABASE_URL env

Currently the dev container is not accessible from the host system because the ports that the servers run on are not exposed. Additionally the DATABASE_URL is not set in the container and should be set to make running the servers in the container easier.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.