Giter Club home page Giter Club logo

eda-data-visualization's Introduction

EDA with Data Visualization

This study focuses on improving visualization technics throughout the EDA and Feature Engineering process before the model development. The Google Play Store dataset is used for this study, it includes the app's information on the different categories.

Generally, in the Kaggle notebooks, the main purpose of using this dataset is to predict the number of installs of the apps according to the given features. However, the focus of this study is not on developing the prediction model but is dealing with the techniques and details of the model development process preprocessing. Because preprocessing is one of the most important processes of model development. Especially, visualization technics are very helpful for this purpose. Extracting information is a leading process to decide what we expect from the model and which features can be more essential to detect the target feature.

Actually, this study does not include detailed information about the dataset, but it provides all techniques/codes to make data transformation, descriptive analysis, and visualization. So, you can use these techniques and perspectives before each model development process. The dataset includes categorical and numeric values at the same time, so you can find how you can deal with both features.

I hope this notebook will be a good resource for preprocessing and exploratory data analysis with visualization techniques.

Dataset

The dataset used in this study is obtained from the Kaggle, you can reach it from this link. Only 'googleplaystro.csv' is used for this study. You can also reach the dataset below the dataset folder. The dataset includes 13 features, you can find the details of the dataset in the data transformation notebook.

The transformed dataset in the first phase also was uploaded.

Environment

To install the dependencies to run the notebook, you can use Anaconda. Once you have installed Anaconda, run:

$ conda env create -f environment.yml

Notebooks

data-transformation.ipynb notebook includes all data cleaning, and transformation processes.

eda-visualization.ipynb includes all visualization techniques for univariate and bivariate analysis.

Proposed Resources

Throughout this study, several resources helped but especially the Exploratory Data Analysis with Python Cookbook By Ayodele Oluleye helped to how can we approach when the data is visualized. It's a strongly recommended resource. You can find the other resources;

Contribution

If you want to contribute please, send your pull request. All contributions are welcome!

Please check that repository for updates, for opening issues or sending pull requests.

eda-data-visualization's People

Contributors

ftmoztl avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.