Giter Club home page Giter Club logo

classify-client's Introduction

Classify Client

This is an optimized version of the classify client endpoint in Normandy.

Dev instructions

This is a normal Cargo project, so after cloning the repository, you can build and run it with

$ cargo build
$ cargo run

This project should run on the latest stable version of Rust. Unstable features are not allowed.

GeoIP Database

A GeoIP database will need to be provided. By default it is expected to be found at ./GeoLite2-Country.mmdb.

Configuration

Via environment variables:

  • DEBUG: Set to "true" to enable extra debugging options, such as a /debug endpoint that shows internal server state (default: "false").
  • GEOIP_DB_PATH: path to GeoIP database (default: "./GeoLite2-Country.mmdb")
  • HOST: host to bind to (default: "localhost")
  • HUMAN_LOGS: set to "true" to use human readable logging (default: MozLog as JSON)
  • METRICS_TARGET: The host and port to send statsd metrics to. May be a hostname like "metrics.example.com:8125" or an IP like "127.0.0.1:8125". Port is required. (default: "localhost:8125")
  • PORT: port number to bind to (default: "8000")
  • SENTRY_DSN: report errors to a Sentry instance (default: "")
  • SENTRY_ENV: Sentry environment (default: "production")
  • SENTRY_SAMPLE_RATE: Sentry sampling rate (default: 1.0)
  • TRUSTED_PROXY_LIST: A comma-separated list of CIDR ranges that trusted proxies will be in. Supports both IPv4 and IPv6.
  • VERSION_FILE: path to version.json file (default: "./version.json")
  • API_KEYS_FILE: path to apiKeys.json file for /v1/country endpoint (default: "./apiKeys.json")

Tests

Tests can be run with Cargo as well

$ cargo test

Linting

Linting is handled via Therapist. After installing it, enable the git hooks using either therapist install or therapist install --fix. The --fix variant will automatically format your code upon commit. The variant without --fix will simply show an error and ask you to reformat the code using other means before committing. Therapist runs in CI.

The checks Therapist runs are:

  • Rustfmt
  • Clippy, using the clippy::all preset

Former endpoints from Mozilla Location Services

Endpoints from the Mozilla Location Services project has been migrated to classify-client for continuity.

  • /v1/country - Requires an api key.
    • Downstream firefox builds can self select a key that matches this expression: ^firefox-downstream-\w{1,40}$
    • The API key is required here just to have rough usage metrics and allow us to reach out to project maintainers if needed in the future.
  • /v1/geolocate - Intentionally not routed to return a 404
  • /v1/geosubmit - Static 403 response
  • /v1/submit - Static 403 response
  • /v2/geosubmit - Static 403 response

classify-client's People

Contributors

alexcottner avatar bors[bot] avatar dependabot[bot] avatar leplatrem avatar mozilla-github-standards avatar mythmon avatar renovate-bot avatar sciurus avatar smarnach avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

classify-client's Issues

Heartbeat is returning 503 even though geoip lookup is working

Shelling to a container shows the geoip db is present. Checking the location metric shows tags for many different countries. However, when you hit https://classify-client.services.mozilla.com/__heartbeat__ (or dev or stage version) it returns a 503 response with the body {"geoip":false}.

I think this is because the heartbeat code tries to geocode 1.2.34 and verify this is a US address. When I plug that address into https://www.maxmind.com/en/geoip-demo, it is reported as being in Russia.

Wiki changes

FYI: The following changes were made to this repository's wiki:

These were made as the result of a recent automated defacement of publically writeable wikis.

Detect client IP more securely

Currently, an attacker can trivially lie about their IP address, and we trust it. We should figure out a strategy that works along with our deployment strategy to reliably determine the user's IP address.

Another Mozilla project, Channel Server, analyzes headers and takes the first unknown IP from X-Forwarded-For: https://github.com/mozilla-services/channelserver/blob/c38039da6330701394a3bfa723a999e4d4960744/channelserver/src/meta.rs#L144

I'm not sure if this will work in GCP, as I was able to spoof X-Forwarded-For easily. We'll need to figure out how GCP's load balancers handle client IP.

Remove use of ci-docker-bases

We want to retire ci-docker-bases. In order to do so, all projects using it must migrate their use to something else. Classify-client is one of those projects.

Progress and notes can be found in the decision brief.

Memory usage grows indefinitely

To reproduce, run the following in three separate shells

docker run -p 8000:8000/tcp mozilla/classify-client:latest
docker stats
ab -c 1 -n 100000 -k http://localhost:8000/api/v1/classify_client/

Observe that memory usage starts at 5MiB but grows continuously, exceeding 600 MiB by the time all 100k requests have been served.

Allow downstream firefox builds to self-select API keys to /country

As a part of the MLS sunset process, we are migrating the functionality of one endpoint to classify-client for backwards compatibility: /v1/country. Firefox currently uses this to identify the current region the browser is in (for preferences and defaults).

Previously, downstream builds of Firefox (ex: package maintainer builds for linux distros) would use their MLS key to reach this endpoint. Maintainers and downstream devs would need to apply for a key, Mozilla would need to approve and issue the key, and then MLS would need to authorize every request coming in against the set of known keys.

Going forward we would prefer to allow downstream builds to self-identify. We will allow requests to /v1/country that provide an API key that are either:

  1. An existing known MLS key for a Firefox build, including existing downstream keys
  2. Follow the pattern /^firefox\-downstream\-\w{1,40}$/ . Example: firefox-downstream-debian_13_0.

Consider running Therapist in CI

Currently, Therapist is configured to work for developer machines, and the linters are run directly in CI. It may be a good idea to use Therapist to run the linters in CI as well, to keep a consistent system.

This would involve setting up a Docker image that provides Therapist and Python and using that in CI, instead of the basic Rust image. It may be good to put this image in mozilla/ci-docker-bases.

CODE_OF_CONDUCT.md file missing

As of January 1 2019, Mozilla requires that all GitHub projects include this CODE_OF_CONDUCT.md file in the project root. The file has two parts:

  1. Required Text - All text under the headings Community Participation Guidelines and How to Report, are required, and should not be altered.
  2. Optional Text - The Project Specific Etiquette heading provides a space to speak more specifically about ways people can work effectively and inclusively together. Some examples of those can be found on the Firefox Debugger project, and Common Voice. (The optional part is commented out in the raw template file, and will not be visible until you modify and uncomment that part.)

If you have any questions about this file, or Code of Conduct policies and procedures, please see Mozilla-GitHub-Standards or email [email protected].

(Message COC001)

Add Instrumentation

We should have some instrumentation so we can know what classify client is doing. For the rationale behind these suggestions, see The First Four Things You Measure.

I think at a minimum we want timer to track the number and latency of successful and failed responses. Something like the name classify-client.response tagged either status:success or status:error?

Since classify-client is asynchronous and can be handling multiple requests at once, it would be nice to have a gauge that is incremented when a request is received and decremented when we've finished sending the response. Something like the name classify-client.ongoing_requests?

Finally, it could be interesting to have a counter that is incremented for each successful request and labeled by the country of the client. Something like the name classify-client.location with tags like country:us, country:fr, etc?

To add the instrumentation, cadence is the most promising crate I can find. It supports statsd metric types along with the dogstatsd extension for tags and an efficient method of publishing them.

The host and port to send to should be configurable by environment variables and default to localhost. and 8125.

Add tests

Currently nothing has tests. Lets fix that. We don't need to get to 100% coverage, but it would be nice to test the important parts of the app.

  • classify endpoint #39
  • dockerflow endpoints #29
  • geoip lookups #42
  • logging #45
  • settings #44
  • error handling?

IM notifications for deployments

Setup IM notifications for prod deployment process. Should alert syseng team when classify-client deployment pipeline runs to prod.

Security Checklist

Risk Management

  • The service must have performed a Rapid Risk Assessment and have a Risk Record bug
  • The service must be registered via a New Service issue

Infrastructure

  • Access and application logs must be archived for a minimum of 90 days
  • Use Modern or Intermediate TLS
  • Set HSTS to 31536000 (1 year)
    • strict-transport-security: max-age=31536000
    • If the service is not hosted under services.mozilla.com, it must be manually added to Firefox's preloaded pins. This only applies to production services, not short-lived experiments.
  • If service has an admin panels, it must:
    • only be available behind Mozilla VPN (which provides MFA)
    • require Auth0 authentication

Development

  • Ensure your code repository is configured and located appropriately:
    • Application built internally should be hosted in trusted GitHub organizations (mozilla, mozilla-services, mozilla-bteam, mozilla-conduit, mozilla-mobile, taskcluster). Sometimes we build and deploy applications we don't fully control. In those cases, the Dockerfile that builds the application container should be hosted in its own repository in a trusted organization.
    • Secure your repository by implementing Mozilla's GitHub security standard.
  • Sign all release tags, and ideally commits as well
    • Developers should configure git to sign all tags and upload their PGP fingerprint to https://login.mozilla.com
    • The signature verification will eventually become a requirement to shipping a release to staging & prod: the tag being deployed in the pipeline must have a matching tag in git signed by a project owner. This control is designed to reduce the risk of a 3rd party GitHub integration from compromising our source code.
  • enable security scanning of 3rd-party libraries and dependencies
    • For node.js, use npm audit with audit-filter to review and handle exceptions (see example in speech-proxy)
    • For Python, enable pyup security updates:
      • Add a pyup config to your repo (example config: https://github.com/mozilla-services/antenna/blob/master/.pyup.yml)
      • Enable branch protection for master and other development branches. Make sure the approved-mozilla-pyup-configuration team CANNOT push to those branches.
      • From the "add a team" dropdown for your repo /settings page
        • Add the "Approved Mozilla PyUp Configuration" team for your github org (e.g. for mozilla and mozilla-services)
        • Grant it write permission so it can make pull requests
      • notify [email protected] to enable the integration in pyup
  • Keep 3rd-party libraries up to date (in addition to the security updates)
  • Integrate static code analysis in CI, and avoid merging code with issues
    • Javascript applications should use ESLint with the Mozilla ruleset
    • Python applications should use Bandit
    • Go applications should use the Go Meta Linter
    • Use whitelisting mechanisms in these tools to deal with false positives

Dual Sign Off

  • Services that push data to Firefox clients must require a dual sign off on every change, implemented in their admin panels
    • This mechanism must be reviewed and approved by the Firefox Operations Security team before being enabled in production

Logging

  • Publish detailed logs in mozlog format (APP-MOZLOG)
    • Business logic must be logged with app specific codes (see FxA)
    • Access control failures must be logged at WARN level

Web Applications

  • Must have a CSP with
    • a report-uri pointing to the service's own /__cspreport__ endpoint
    • web API responses should return default-src 'none'; frame-ancestors 'none'; base-uri 'none'; report-uri /__cspreport__ to disallowing all content rendering, framing, and report violations
    • if default-src is not none, frame-src, and object-src should be none or only allow specific origins
    • no use of unsafe-inline or unsafe-eval in script-src, style-src, and img-src
  • Third-party javascript must be pinned to specific versions using Subresource Integrity (SRI)
  • Web APIs must set a non-HTML content-type on all responses, including 300s, 400s and 500s
  • Set the Secure and HTTPOnly flags on Cookies, and use sensible Expiration
  • Make sure your application gets an A+ on the Mozilla Observatory
  • Verify your application doesn't have any failures on the Security Baseline.
    • Contact secops@ or ping 'psiinon' on github to document exceptions to the baseline, mark csrf exempt forms, etc.
  • Web APIs should export an OpenAPI (Swagger) to facilitate automated vulnerability tests

Security Features

  • Authentication of end-users should be via FxA. Authentication of Mozillians should be via Auth0/SSO. Any exceptions must be approved by the security team.
  • Session Management should be via existing and well regarded frameworks. In all cases you should contact the security team for a design and implementation review
    • Store session keys server side (typically in a db) so that they can be revoked immediately.
    • Session keys must be changed on login to prevent session fixation attacks.
    • Session cookies must have HttpOnly and Secure flags set and the SameSite attribute set to 'strict' or 'lax' (which allows external regular links to login).
    • For more information about potential pitfalls see the OWASP Session Management Cheat Sheet
  • Form that change state should use anti CSRF tokens. Anti CSRF tokens can be dropped for internal sites using SameSite session cookies where we are sure all users will be on Firefox 60+. Forms that do not change state (e.g. search forms) should use the 'data-no-csrf' form attribute.
  • Access Control should be via existing and well regarded frameworks. If you really do need to roll your own then contact the security team for a design and implementation review.
  • If you are building a core Firefox service, consider adding it to the list of restricted domains in the preference extensions.webextensions.restrictedDomains. This will prevent a malicious extension from being able to steal sensitive information from it, see bug 1415644.

Databases

  • All SQL queries must be parameterized, not concatenated
  • Applications must use accounts with limited GRANTS when connecting to databases
    • In particular, applications must not use admin or owner accounts, to decrease the impact of a sql injection vulnerability.

Common issues

  • User data must be escaped for the right context prior to reflecting it
    • When inserting user generated html into an html context:
      • Python applications should use Bleach
      • Javascript applications should use DOMPurify
  • Apply sensible limits to user inputs, see input validation
    • POST body size should be small (<500kB) unless explicitly needed
  • When allowing users to upload or generate content, make sure to host that content on a separate domain (eg. firefoxusercontent.com, etc.). This will prevent malicious content from having access to storage and cookies from the origin.
    • Also use this technique to host rich content you can't protect with a CSP, such as metrics reports, wiki pages, etc.
  • When managing permissions, make sure access controls are enforced server-side
  • If an authenticated user accesses protected resource, make sure the pages with those resource arent cached and served up to unauthenticated users (like via a CDN).
  • If handling cryptographic keys, must have a mechanism to handle quarterly key rotations
    • Keys used to sign sessions don't need a rotation mechanism if destroying all sessions is acceptable in case of emergency.
  • Do not proxy requests from users without strong limitations and filtering (see Pocket UserData vulnerability). Don't proxy requests to link local, loopback, or private networks or DNS that resolves to addresses in those ranges (i.e. 169.254.0.0/16, 127.0.0.0/8, 10.0.0.0/8, 100.64.0.0/10, 172.16.0.0/12, 192.168.0.0/16, 198.18.0.0/15).
  • Do not use target="_blank" in external links unless you also use rel="noopener noreferrer" (to prevent Reverse Tabnabbing)

Add required routes for some additional geolocate work

Need to add some additional geolocate routes for future work.

  • /api/v1/country - return country_name and country_code in response, 404 if unable to find country
  • /api/v1/geolocate - return 401 for now, no authentication mechanism
  • /api/v1/geosubmit - return 403 for now, no authentication mechanism
  • /api/v1/submit - return 403 for now, no authentication mechanism
  • /api/v2/geosubmit - return 403 for now, no authentication mechanism

Add unit tests around /api/v1/country.

Profile the service to look for hotspots

I suspect we have a couple hot spots that could be easily optimized that we should profile. Specifically, I think that requesting the current time for every request is probably making a lot of syscalls which slow us down.

This issue covers profiling the existing code and filing optimization issues for any found hotspots.

Release v1.0.0

Welcome Release Captain โ›ต

  • Create 1.1.0 Milestone (due date two weeks from now)
  • Move all open tickets in 1.0.0 to 1.1.0
  • Close 1.0.0 Milestone
  • Approve and merge all open dependabot PRs
  • Pull master
  • Ensure cargo test passes
  • git tag v1.0.0; git push --tags
  • @here deploying classify client v1.0.0 to stage https://github.com/mozilla/classify-client/milestone/1?closed=1 in #normandy
  • Quickly verify that stage loads with v1.0.0
  • @bpitts please push classify-client v1.0.0 to production in #normandy
  • Verify that prod loads with v1.0.0
  • Create release issue for Release v1.1.0

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.