Giter Club home page Giter Club logo

birdidq's Introduction

BirdiDQ - About πŸ”

BirdiDQ is an intuitive and user-friendly data quality application that allows you to run data quality checks on top of python great expectation open source library using natural language queries. Type in your requests, and BirdiDQ will generate the appropriate GE method, run the quality control and return the results along with data docs you need. Demo Video

BirdiDQ Screenshot 1

BirdiDQ Screenshot 2

Note

BirdiDQ is under development and is an open source project. Contributions are welcomed!

Features

  • πŸ” Data Exploration Made Easy: Quickly and interactively explore your data using a range of features like filters, comparisons, and more. Uncover hidden insights and make informed decisions with confidence.
  • 🎯 Natural Language Processing:: Speak BirdiDQ's language! No technical expertise required. Simply type in your queries, and BirdiDQ intelligently converts them into powerful Great Expectations methods (using a fine-tuned Large Language Model), saving you time and effort..
  • ⚑ Instant Results: Run comprehensive data quality checks on your selected data sources, and get instant feedback on data inconsistencies. BirdiDQ ensures that your data is reliable and trustworthy.
  • πŸ“§Automate Email Alert: Reach out to the Data Owner directly through the app, sending them an email with the detailed data quality report generated by Great Expectations.
  • GEN AI models: Uses finetuned LLM on customed expectations data.

Tech Stack

This app is an LLM-powered app built using:

Instalation instructions

To run BirdiDQ, you need to perform the following steps:

  1. Clone the repository locally:

     git clone https://github.com/BirdiD/BirdiDQ.git
    
  2. (Recommended) Create a virtual environment and activate it:

     python3 -m venv bir_env
     source bir_env/bin/activate
    
  3. Install the required dependencies:

     pip install -r requirements.txt
    
  4. Run the app:

     streamlit run great_expectations/app.py
    

Note: BirdiDQ can use OpenAI's ChatGPT or Falcon LLM to convert the natural language descriptions to expectations. If you plan to use Falcon, consider using Pytorch with GPU support for better performance. To install Pytorch with CUDA support follow the instructions avaiable at for your Operating System at Pytorch.

System requirements for local Falcon LLM usage

Falcon 7b is an open source large language model (LLM) that can be used with BirdiDQ to convert natural language descriptions to Great Expectations expectations. To use the current fine-tuned Falcon 7b, you need to have a system with the following minimum requirements:

  • If you don't have a GPU, you need at least 16GB of RAM to load the model into the memory. Inferencing will be really slow.
  • You need a GPU with at least 16GB of VRAM to load the model into the memory. Inferencing will be faster.

Example Queries

Here are some example queries you can try with BirdiDQ:

  • Ensure that at least 80% of the values in the country column are not null.
  • Check that none of the values in the address column match the pattern for an address starting with a digit.

BirdiDQ Integration Stack

BirdiDQ integrates, connects, and works with a range of tools and services.

  • Filesystem
    • Support Local Filesystem with Pandas
    • Support Local Filesystem with Spark
  • Database
    • Support PostgreSQL
    • Support BigQuery
    • Support Snowflake
    • Support Amazon Athena
    • Support AWS Redshift
  • Cloud
    • Connect to data on Amazon S3 using Pandas
    • Connect to data on Azure Blob storage using Pandas
    • Connect to data on GCS using Pandas

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.