Giter Club home page Giter Club logo

data_validation's Introduction

Data Validation for Data Science

This repository contains data, code and Jupyter notebooks for the validation of data science projects tutorial. The tutorial consists of three sections for each step in the production data science model life cycle:

  1. Database management (using Great Expectations)
  2. Training pipeline (using Pandera)
  3. Model serving (using Pydantic)

Each section comes with a notebook in which there are explanations, code snippets and exercises.

If you would like to see me run through these notebooks from PyData London 2022, you can navigate to this YouTube video: Data Validation for Data Science | PyData London 2022

Data

Dataset used for the purposes of this tutorial is taken from the House prices prediction competition on Kaggle. Two CSV files located in the data folder: train.csv and test.csv.

Instructions

To Follow the notebooks and exercises there are two options:

  1. Use your own Python environment with Jupyter installed. The notebooks are run using the jupyter notebook command, select the notebook you want to run in the notebooks folder and follow the instructions. For running the different tools with all of the features available it is recommended to use Python 3.8 and up.
  2. Use Google Colaboratory without any pre-installation needed. Click the link to go to the repository's GitHub page. Choose one of the notebooks in the notebooks folder and from the interactive view, click on the link to open in Colab.

data_validation's People

Contributors

natanmish avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.