Giter Club home page Giter Club logo

onsdigital.es-enrichment-sg's Introduction

es-enrichment-sg

Enrichment - Python Lambdas.

Wrangler

The enrichment wrangler is the start of the process. It first picks up the sng data from s3. It invokes the method lambda with this data. The method response contains two dataframes(data and anomalies), which are split out in the wrangler. Data is sent on to the sqs queue whereas the anomalies are sent via an sns topic.

Method

The method is generic. As well as the data, it receives information about lookups to use and survey specific parameters. example:

"RuntimeVariables": {
    "data":{ ...},
    "lookups":{
      "0": {
        "file_name": "responder_county_lookup_prod.json",
        "columns_to_keep": [
          "responder_id",
          "county"
        ],
        "join_column": "responder_id",
        "required": [
          "county"
        ]
      },
      "1": {
        "file_name": "county_lookup_county.json",
        "columns_to_keep": [
          "county_name",
          "region",
          "county",
          "marine"
        ],
        "join_column": "county",
        "required": [
          "region",
          "marine"
        ]
      }
    },
    "marine_mismatch_check": true,
    "period_column": "period",
    "identifier_column": "responder_id"
}

Lookups

The 'file_name' dictates which file to get from s3.
The 'columns_to_keep' represents the columns from the lookup to join on.
The 'join_column' is the column to use to join onto the data.
The 'required' columns are used later in integrity tests, checking that no nulls exist in any required columns.

Parameters

Parameters are taken from environment variables in the wrangler, packaged and sent over to the method. marine_mismatch_check - determines whether to run the marine mismatch check or not.

Integrity Checks

There are two integrity checks in the method.

Missing column detector

Using a list of required columns that are constructed from the lookups section of the input. The missing column detector filters the original dataset to see any instances where required columns are null for a reference. It outputs a list of references with missing data for columns.

Marine Mismatch Detector

Detects references that are producing marine but from a county that doesnt produce marine by checking the 'land_or_marine' column against a specified column(marine) to confirm that if M, the marine column is y.

Marine mismatch detector is only suitable for sand and gravel. So far that is the only survey that differentiates between land and marine, so is the only survey that would benefit from this check.

onsdigital.es-enrichment-sg's People

Contributors

bitmonkey avatar dependabot[bot] avatar dom-ford avatar glanvl avatar jordancooke avatar kingmushroom avatar krisrogos avatar lukeglanville avatar mkeating avatar piwington avatar thomashenson avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.