Giter Club home page Giter Club logo

Saurabh Arunkumar Somani's Projects

apache-spark-kafka-poc-s icon apache-spark-kafka-poc-s

Implemented Spark machine learning Pipeline on AWS EMR for Collaborative Filtering to recommend users which online educational course they should take based on their viewing history. Target audience found using K-Means clustering over 2 billion data rows. • Using Kafka & Spark Structured Streaming simulated the above models as real time events with a window size of 2 minutes. • House price prediction for California residents based on Kaggle’s 2014/15 dataset using Linear Regression. Narrowed down the customers who were likely to purchase using Logistic Regression & Decision Tree Classifier along with Random Forests to choose the best performing model.

beyond-leetcode-sql icon beyond-leetcode-sql

Analysis of SQL Leetcode and classic interview questions. Common pitfalls, anti-patterns and handy tricks are discussed. Sample databases are provided.

california_housing_prices icon california_housing_prices

End-to-End Machine Learning Project based on the Stalib repository. This dataset is based on data from the 1990 California census. The machine learning models built in this project helps in determining whether to invest or not in the given area. The final prediction is made for a district's median housing price. This data has metrics such as the population, median income, median housing price, and so on for each block group in California. Block groups are the smallest geographical unit for which the US Census Bureau publishes sample data (a block group typically has a population of 600 to 3,000 people). </br> System uses multiple features to make a prediction (it will use the district’s population, the median income, etc.). This classifies for a **multivariate regression problem**. Hence the performance measure chosen for this is **_RMSE_** (Root Mean Square Error).

data_analysis_and_visualization_using_pandas icon data_analysis_and_visualization_using_pandas

Using Pandas, NumPy, Matplotlib, Seaborn, Plotly and Cufflinks, the answers for financial crisis of 2007/08 & the most crime affected US regions are visualized with detailed description leading to those events.

deep-learning-dashboard icon deep-learning-dashboard

DeepDash, is a deep learning dashboard which builds predictive stats from image data using 2 models: K Nearest Neighbors and VGG19. These stats are then displayed on the front end for user to analyse and develop/choose a strategy to solve the business problem he/her is working on. This first release of DeepDash is intended for people who have access to image datasets (public or private).

dgim_project icon dgim_project

Mini project to implement DGIM Algorithm for estimating number of ones in continuous bit stream. Objective of this project is to estimate number of ones in past K data with a tolerance of 33%. The major challenge with continuous stream of bit is that storing the stream in main memory is not possible because of the continuous flow of bit stream which will start accumulating and eventually exceeds the size of memory. Therefore, using DGIM Algorithm, n number of bits can be stored in log n memory space. Often, it is much more efficient to get an approximate answer to our problem than an exact solution.

examples icon examples

Apache Kafka and Confluent Platform examples and demos

gitignore icon gitignore

A collection of useful .gitignore templates

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.