Giter Club home page Giter Club logo

moj-analytical-services.data_linter_deprecated's Introduction

data_linter

Actions Status

A python package that validates datasets against a metadata schema which is defined here. Try it out with our interactive demo.

It performs the following checks:

  • Are the columns of the correct data types (or can they be converted without error using pd.Series.astype in the case of untyped data formats like csv)
  • Column names:
    • Are the columns named correctly?
    • Are they in the same order specified in the meta data
    • Are there any missing columns?
  • Where a regex pattern is provided in the metadata, does the actual data always fit the pattern
  • Where an enum is provided in the metadata, does the actual data contain only values in the enum
  • Where nullable is set to false in the metadata, are there really no nulls in the data?

The package also provides functionality to impose_metadata_types_on_pd_df, which allows the user to safely convert a pandas dataframe to the datatypes specified in the metadata. This is useful in the case you have an untyped data file such as a csv and want to ensure it is conformant with the metadata.

Installation

pip install data_linter

Usage

For detailed information about how to use the package, please see the demo repo. This includes an interactive tutorial that you can run in your web browser.

Here's a very basic example

import pandas as pd
import json

from data_linter.lint import Linter

def read_json_from_path(path):
    with open(path) as f:
        return_json = json.load(f)
    return return_json

meta = read_json_from_path("tests/meta/test_meta_cols_valid.json")
df = pd.read_parquet("tests/data/test_parquet_data_valid.parquet")
l = Linter(df, meta)
l.check_all()
l.markdown_report()

moj-analytical-services.data_linter_deprecated's People

Contributors

robinl avatar isichei avatar mandarinduck avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.