Giter Club home page Giter Club logo

data-science-portfolio's Introduction

Data Science Portfolio by Dimitrios Effrosynidis

This portfolio is a compilation of notebooks which I created for Data Science related tasks like Tutorials, Exploratory Data Analysis, and Machine Learning. More notebooks will be added as I learn things and devote time to write about them.

Below it is a summary of them.

๐Ÿ”ฅ Exploratory Data Analysis for the popular Battle Royale game PUBG

This is a very popular kaggle kernel with more than 800 upvotes and 30.000 views, with which I won the 1st prize for the best kernel in that Kaggle competition.

๐Ÿก Clustering Neighborhoods

This is a project that aims to help practicing some technologies and Data Science.

Let's suppose that you live in Toronto, Canada (you can do this for every city that has enough data) and you found a better job. This job is located in the other side of the city and you decide that you need to re-locate closer. You really like your neighborhood though, and you want to find a similar one.

This code uses the venues of each neighborhood as features in a clustering algorithm (k-means) and finds similar neighborhoods.

Things that were used

  1. Beautiful Soup - Package that lets us extract the content of a web page into simple text
  2. Json - Handle json files and transform them into a pandas dataframe
  3. Geocode - Package that converts an address to its coordinates
  4. Scikit Learn - Machine learning package in order to use clustering
  5. Folium - Package to create spatial maps. NOTE: Maps that are created from folium are not displayed in jupyter notebook. I provide links to them as static images.

๐Ÿ“™ Pandas Tutorial

Are you starting with Data Science? Pandas is perhaps the first best thing you will need. And it's really easy!

After reading (and practising) this tutorial you will learn how to:

  • Create, add, remove and rename columns
  • Read, select and filter data
  • Retrieve statistics for data
  • Sort and group data
  • Manipulate data

๐Ÿ“ Normalization and Standardization

Normalization/standardization are designed to achieve a similar goal, which is to create features that have similar ranges to each other and are widely used in data analysis to help the programmer to get some clue out of the raw data.

This notebook includes:

  • Normalization
  • Why normalize?
  • Standardization
  • Why standardization?
  • Differences?
  • When to use and when not
  • Python code for Simple Feature Scaling, Min-Max, Z-score, log1p transformation

๐Ÿ”ง Encoding Categorical Features

Python code on how to transform nominal and ordinal variables to integers.

This Notebook includes:

  • Ordinal Encoding with LabelEncoder, Panda's Factorize, and Panda's Map
  • Nominal Encoding with One-Hot Encoding and Binary Encoding

๐Ÿ“Š Visualizations with Seaborn

Every plot that seaborn provides is here with examples in a real dataset.

This notebook includes:

  • Theory on Skewness and Kurtosis
  • Univariate plots. [Histogram, KDE, Box plot, Count plot, Pie chart]
  • Bivariate plots. [Scatter plot, Join plot, Reg plot, KDE plot, Hex plot, Line plot, Bar plot, Violin plot, Boxen plot, Strip plot]
  • Multivariate plots. [Correlation Heatmap, Pair plot, Scatter plot, Line plot, Bar plot]

๐Ÿ•ฅ Feature Engineering with Dates

In this tutorial I present the datetime format that Pandas provides to handle datetime features. In the end I create a function that generates 23 features from a single one.

data-science-portfolio's People

Contributors

deffro avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.