Giter Club home page Giter Club logo

e2e_ds_project's Introduction

End-to-end Data Science project

This is the repo with the notebooks, code, and additional material used in the ITI's workshop. The goal of the sessions was to illustrate the end-to-end process of an real project.

Additional material

In addition to the notebooks and code, the following material is also available:

Problem statement

Our (fictional) client is an IT educational institute. They have reached out to us has reach out with the following: “IT jobs and technologies keep evolving quickly. This makes our field to be one of the most interesting out there. But on the other hand, such fast development confuses our students. They do not know which skills they need to learn for which job. “Do I need to learn C++ to be a Data Scientist?” “Do DevOps and System admins use the same technologies?” “I really like JavaScript; can I use it in Data Analytics?” Those are some of the questions that our students ask. Could you please develop a data-driven solution for our students to answer such questions? They mostly want to understand the relationships between the jobs and the technologies.


Level guide

Basic Intermediate Advanced
Business case Decide on the KPIs that you will positively influence Calculate the expected financial returns
Data collection Decide on and collect a suitable data source for your business case Decide on, collect and connect multiple data sources for better performance
Legal review Get basic information about the local data privacy law Study the local data privacy law
Cookie Cutter Create the standard directory structure
Git Use Git's GUI to track on master branch Use Git's CLI to track on Dev branch and merge back to Master Decide on a branching strategy and solve merge conflicts
Environments Install python packages using conda Create a dedicated conda environment Share your environment and install it on a different machine
Data cleaning Use basic statistics to filter out non-sense entries Use advanced statistics and unsupervised learning to filter out non-sense entries Calculate a 'sanity probability value' for each data point and use it later as the weight
Descriptive analytics Calculate summary statistics to provide data insights Produce visualizations to provide deeper understanding Apply unsupervised learning to provide even deeper understanding
Predictive analytics Create a single baseline model Create multiple hyper-tuned models. Benchmark their performance Combine the chosen models via ensemble and provide prediction confidence
Prescriptive analytics Recommend the action that the user should take
Software Engineering Refactor your notebooks to simple python scripts Create a production OOP class for predictions Expose your model using an API
MLops Export and load models from pickle files Track your models using Mlflow Create and run a docker image for your project
Product Create a Web App / GUI to expose prediction functionality Add the relevant historical insights, predictions and optimization results Collect users' feedback and retrain your model accordingly

e2e_ds_project's People

Contributors

deena-gergis avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.