Giter Club home page Giter Club logo

ecce-homo's Introduction

Ecce Homo

Ecce Homo aims to be an automatic tool to implement machine learning workflow tasks, often call as AutoML. To see an example please visit eccehomo folder where you will find a IPython Notebbok with implementation on titanic dataset. This is part of the Final Capstone project on the Machine learning engineer nanodegree. For more information please visit Udacity. Or visit either proposal.pdf or project_report.pdf.

Why Ecce Homo?

If you want to easily establish a benchmark for any particular model you are working on, try Ecce Homo, to fastly implement a machine learning model. Also, if you are a newby in machine learning and want to create your first product give it a try.

What can Ecce Homo do?

It can perform automatic exploratory analysis tasks and export them to a html file such as:

  • Dataset description, number of rows and columns, name of the features.
  • Summary statistics, mean, median, first and third quartile, minimum and maximum values, as well as count.
  • Group by, aggregation on categorical variables, for fast decisions and understanding of data classes.
  • Boxplot plot categorical variables to have a deep understanding of the distribution of numerical data with information on the target class as well.
  • Histograms plot the distribution of continuous data to understand distribution and compare distributions between the classes of target variables.
  • Give brief understanding of empty values on each variable.
  • Correlations print a correlation matrix with all numerical variables plus a heatmap to visually find highly correlated features.
  • Show pairwise scatter plots on n random features (to easily visualize all features on high dimensional spaces). To look for patterns in data.
  • Bar plots on categorical data to understand how classes are distributed among.

You can also perform automatic data imputation with just four lines of code for train and test separetly to keep test integrity. Calling DataImputation class
imputation = eccehomo.DataImputation(X_train, X_test)
Defining imputation for train set:
X_train = imputation.user_defined(defined_methods = {"Embarked":"mode", "Cabin":"indicator", "Age":"knn"}, indicator = 'other')
Performing imputation on test set, either by using same method or same value as in train:
X_test = imputation.test_imputation(imputation.imputed_values, use_value = True)
Finally print information of imputed values:
imputation.imputed_values

You can perform outlier detection using isolation forest, DBSCAN, discretization or boxplot method. With one line of code:
X = eccehomo.Outlier(X_train).isolation_forest(max_outliers = 1000)

Also perform Bayesian Optimization to search more efficently on a predifined hyperparamters space. Using one line of code:
model = eccehomo.Modeling(X_train, y_train, algo = "RandomForest", iter = 50, scoring='precision').optimize()
For more information on Bayesian optimization, please visit: https://github.com/fmfn/BayesianOptimization.

What you need:

In order to perform the examples found on the examples folder. You need the following libraries and datsets.

  • Sklearn version 0.22.1
  • pandas 0.25.1
  • imblearn 0.6.1
  • matplotlib 3.1.3
  • seaborn 0.9.0
  • numpy 1.17.2
  • Installation of bayesian optimization through:
    $ pip install bayesian-optimization

Datasets can be found in eccehomo/name/data folder or in the following Kaggle links to competitons:

License

This project is built under the MIT license, for more information visit here.

What is missing?

  • Unit tests to workflow
  • Improve data imputation and model hyperparameter tunning.
  • Work on Balancing dataset module
  • Model Evaluation
  • Unit tests to repo and library.

ecce-homo's People

Contributors

mauriciomani avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.