This Portfolio is a compilation of all the Data Science and Data Analysis projects I have done for academic, self-learning and hobby purposes. This portfolio also contains my Achievements, skills, and certificates. It is updated on the regular basis.
Github, LinkedIn, Facebook, Twitter, Instagram
Place your Orders at: Fiverr
- Email: [email protected]
- Recipient of Outstanding Master of Engineering - Industrial Engineering Student Award.
- Publication: Prognosis of Wind Turbine Gearbox Bearing Failures using SCADA and Modeled Data, Proceedings of the Annual Conference of the PHM Society 2020, Vol. 12 No. 1.
- Winner of a TAMU Datathon 2020 among 50+ teams.
- Recipient of TAMU Scholarship and Fee Waiver for excellent academic performance (4.0 GPA).
Traffic Sign Classification Using Deep Learning
Traffic sign classification is an important task for self-driving cars. In this project, the dataset contains 43 different classes of images. Understand the theory and intuition behind Deep Learning and Convolutional Neural Networks (CNNs). Import Key python libraries, dataset and perform image visualization. Perform image normalization and convert images from color-scaled to gray-scaled. Build a Convolutional Neural Network using Keras with Tensorflow 2.0 as a back-end. Compile and fit Deep Convolutional Neural Network model to training data. Assess the performance of trained Convolutional Neural Network model and ensure its generalization using various KPIs.
Customer Survival Analysis and Churn Prediction
In this project I have used survival analysis to study how the likelihood of the customer churn changes over time. I have also implementd a Random Forest model to predict the customer churn and deployed a model using flask webapp on Heroku. App
Instacart Market Basket Analysis
The objective of this project is to analyze the 3 million grocery orders from more than 200,000 Instacart users and predict which previously purchased item will be in user's next order. Customer segmentation and affinity analysis are also done to study user purchase patterns.
Hybrid-filtering News Articles Recommendation Engine
A hybrid-filtering personalized news articles recommendation system which can suggest articles from popular news service providers based on reading history of twitter users who share similar interests (Collaborative filtering) and content similarity of the article and user’s tweets (Content-based filtering).
Predictive Maintenance of Aircraft Engine
In this project I have used models such as RNN, LSTM, 1D-CNN to predict the engine failure 50 cycles ahead of its time, and calculated feature importance from them using sensitivity analysis and shap values. Exponential degradation and similarity-based models are also used to calculate its remaining life.
Wind Turbine Power Curve Estimation
In this project, I have employed regression techniques to estimate the Power curve of an on-shore Wind turbine. Nonlinear trees based regression methods perform best as true power curve is nonlinear. XGBoost is implemented and optimized using GridSearchCV which yields lowest Test RMSE-6.404.
Objective of this project is to identify the in-control data points and eliminate out of control data points to set up distribution parameters for manufacturing process monitoring. I utilized PCA for dimension reduction and Hotelling T2 and m-CUSUM control charts to established mean and variance matrices.
Objective of this project is to perform predictive assesment on the GDP of India through an inferential analysis of various socio-economic factors. Various models are compared and Stepwise Regression model is implemented which resulted in 5.7% Test MSE.
In this project I applied various classification models such as Logistic Regression, Random Forest and LightGBM to detect consumers who will default the loan. SMOTE is used to combat class imbalance and LightGBM is implemented that resulted into the highest accuracy 98.89% and 0.99 F1 Score.
I worked with COVID19 dataset, published by John Hopkins University, which consist of the data related to cumulative number of confirmed cases, per day, in each Country. Another dataset consist of various life factors, scored by the people living in each country around the globe.
-
Introduction
Understand the purpose of the project, the datasets that will be used, and the question we will answer with our analysis. -
Importing COVID19 dataset
Import COVID19 dataset and prepare it for the analysis by dropping columns and aggregating rows. -
Finding a good Measure
Decide on and calculate a good measure for our analysis. -
Importing World happiness report dataset
Import World happiness report dataset, dropping useless columns and Merge it with COVID19 dataset to find correlations among our data. -
Visualizing the results
Visualizing results using Seaborn. -
- Genetic Algorithm : In this file, I have implemented simple genetic algorithm that finds out the list of numbers which equal to any specified number when summed together.
- Bayesian Statistics : In this file, I explored how bayesian statistics works and how prior assumption reflects posterior probabilities using Gun control example.
- Gaussian Mixture Model and Expectation Maximization: In this file, I implemented Expectation Maximization algorithm to find out true distribution of one dimensional GMM of 2 gaussians.
- Linear Regression: In this file, I aim to solve linear regression using analytical method and also by implementing gradient descent, stochastic gradient descent and minibatch gradient descent algorithms.
- Neural Network Implementation: In this file, I implemented simple neural network using forward propogation, backword propogation and optimization functions to predict the customer churn.
-
- SQL Challenges: This repository contains codes of online SQL challenges (From Hackerrank, Leetcode, Testdome, etc.) solved by me.
- Data Science Challenges: This repository contains codes of online Data Science challenges (From Hackerrank, TestDome, etc.) solved by me.
-
- Ranking of NFL teams using Markov-chain methods : In this project I implemented and compared three stationary distribution of Markov-chain based approaches to rank 32 NFL (National Football League) teams from "Best" to "Worst" using the scores of 2007 NFL regular season.
- Ranking of Tennis players : Objective of this project is to rank all Tennis Players based on the matches they played in the year of 2018. This project comprises 4 approaches to rank Tennis players and I have tried to make these approaches more robust sequentially.
- Methodologies: Machine Learning, Deep Learning, Time Series Analysis, Natural Language Processing, Statistics, Explainable AI, A/B Testing and Experimentation Design, Big Data Analytics
- Languages: Python (Pandas, Numpy, Scikit-Learn, Scipy, Keras, Matplotlib), R (Dplyr, Tidyr, Caret, Ggplot2), SQL, C++
- Tools: MySQL, Tableau, Git, PySpark, Amazon Web Services (AWS), Flask, MS Excel