Giter Club home page Giter Club logo

onsdigital.es-disclosure-sg's Introduction

es-disclosure

The disclosure module allows the user to specify the steps which they wish to run against the data. e.g. 1, 2, 5. Currently Stages 1, 2, 5 are implemented but 3 & 4 are currently in development and are present as mocks.

The disclosure methods relies on their being several aggregations produced by the previous step. Refer to aggregation for more information.

Wrangler

Disclosure utilises a single wrangler to orchestrated which method stages are triggered. This is specified via the disclosure_stages runtime variable.

Common Environment Variables

Each wrangler has these variables:
bucket_name: - The name of the bucket used to store data.
method_name: - The method that this wrangler calls.

Runtime variables

These are the runtime variables that need to be present for the module to work correctly.
disclosivity_marker: - Marks if the data is disclosive or not.
publishable_indicator: - Marks if the data should be published or not.
explanation: - The reason why something has been marked as disclosive.
total_column: - The name which is used for the total column in aggregation.
parent_column: - The name of the reference of the parent company
threshold: - The threshold used in the calculation of one of the disclosure calculations.
cell_total_column: - The name given to the cell total column.
top1_column: - The name of the column that holds the largest contributor cell.
top2_column: - The name of the column that holds the second largest contributor cell.
stage5_threshold: - The threshold used in the calculation of one of the disclosure calculations.
disclosure_stages: - The stages of disclosure you wish to run e.g. 1, 2, 5.
in_file_name: - The default input file name to get from s3 (this is the previous methods out_file_name).
out_file_name: - The path and name of the file you wish to save the csv as.
sns_topic_arn: - The sns topic to send summary information to.

General process:

  • Collect the data from s3
  • Turn input data into dataframe
  • Pass input dataframe to the appropriate method
  • Send returned data from method to s3
  • Send summary info to sns.

Methods

The methods perform the actual disclosure calculation. Each contains a method called disclosure which contains an apply() method to apply a given test to each row of the dataframe. Once applied, the dataframe is returned.

Stage 1

Name of Lambda:

stage1_method

Intro:

Checks whether the total for the cell is 0 or rounded to 0

Inputs:

data: input data.
disclosivity_marker: The name of the column to put 'disclosive' marker.
publishable_indicator: The name of the column to put 'publish' marker.
explanation: The name of the column to put reason for pass/fail.
cell_total_column: The name of the column holding the cell total.
total_columns: The names of the columns holding the cell totals.
contributor_reference: The name of the column holding the contributor id.

Outputs:

final_output: Dict containing either:
{"success": True, "data": < stage 1 output - json >}
{"success": False, "error": < error message - string >}

Stage 2

Name of Lambda:

stage2_method

Intro:

Checks whether the number of different ent refs in a cell is at least as much as a certain threshold.

Inputs:

data: input data.
disclosivity_marker: The name of the column to put 'disclosive' marker.
publishable_indicator: The name of the column to put 'publish' marker.
explanation: The name of the column to put reason for pass/fail.
parent_column: The name of the column holding the count of parent company.
threshold: The threshold above which a row is not disclosive.
total_columns: The names of the column holding the cell totals. Included so that correct disclosure columns used.
contributor_reference: The name of the column holding the contributor id.

Outputs:

final_output: Dict containing either:
{"success": True, "data": < stage 2 output - json >}
{"success": False, "error": < error message - string >}

Stage 3

N/A

Stage 4

N/A

Stage 5

Name of Lambda:

stage5_method

Intro:

Not including the specifics of this test as it may be considered sensitive

Inputs:

data: input data.
disclosivity_marker: The name of the column to put 'disclosive' marker.
publishable_indicator: The name of the column to put 'publish' marker.
explanation: The name of the column to put reason for pass/fail.
cell_total_column: The name of the column holding the cell total.
top1_column: The name of the column largest contributor to the cell.
top2_column: The name of the column second largest contributor to the cell.
total_columns: The names of the columns holding the cell totals. Included so that correct disclosure columns used.
contributor_reference: The name of the column holding the contributor id.

Outputs:

final_output: Dict containing either:
{"success": True, "data": < stage 5 output - json >}
{"success": False, "error": < error message - string >}

onsdigital.es-disclosure-sg's People

Contributors

ajorpheus avatar dom-ford avatar glanvl avatar jordancooke avatar kingmushroom avatar krisrogos avatar mkeating avatar piwington avatar rhodriguerrier avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.