datatalking Goto Github PK

followers: 58.0 following: 37.0 repos: 451.0 gists: 13.0

Name: Andrew Schell

Type: User

Bio: Experienced data analyst with 7 years working across differentiated and technically challenging projects

Location: Seattle Wa

Blog: https://datatalking.github.io

Hi there 👋

🔭 Technical data analyst with 7 years project experience
🌱 I’ve worked across differentiated and technically challenging projects.
👯 Looking to collaborate on data cleaning, web scraping, and database automation projects.
📫 How to reach me: LinkedIn (https://www.linkedin.com/in/andrewschell/) or mail me at ([email protected])
📫 Review my portfolio at https://datatalking.github.io
If you see something you like, something is wrong or missing please email at ([email protected]) subject line "Github Errata"

Andrew Schell's Projects

clustergrammer-widget

The Clustergrammer interactive Jupyter notebook widget

code2flow

Turn your Python and Javascript code into DOT flowcharts

computational-modelling

Code and website accompanying Farrell & Lewandowsky's (2017) book

computer-science

:mortar_board: Path to a free self-taught education in Computer Science!

computer_vision_detectron2_ltt

Learn then Test: Calibrating Predictive Algorithms to Achieve Risk Control

constants

multi-environments application constants python module

contrarian-trading-strategies

Source Codes for "Contrarian Trading Strategies in Python"

coverpage

What is new, what I've been doing, what you can look at.

cracking-the-coding-interview

Python solutions for the book Cracking the Coding Interview

crunchbasewrapper

Scraper to get data from crunchbase.com and read - write the data using SQLite database and JSON file.

cudamat

Python module for performing basic dense linear algebra computations on the GPU using CUDA.

Every model did a job well done and best among them all was Logistic Regression that is because data is mostly lineraly seperable, that is why linear kernel of SVM performed better than radial basis funcion, RBF, kernel (or poly kernel, I gave it poly degree four, however, it could be increased to arbitrary number). An other very good predictor have always been Random Forest that is because though it is random, it takes data from many, many forests and finally it finds something that is good enough, it is quite fast as well, compared to svm with poly kernel (with a high degree of polynomial). Similarly KNN has been good at predicting for classificatoin, like this, where classes are seperable linearly, however, it can also perform well where data is clustered this is beauty of KNN but this was simple classification problem and it did a good job. However Naive Bay did worse than others for this problem, that is because it guesses things based on probability rather than finding a pattern in things. Lastly K means Clustering technique was performed. Though there is similarity between classification and clustering however, they are not quite the same thing, for example sometimes data can be mapped on graph through a circle and it may appear in a circular form, though it would still be single class (classification problem) but it would not be possible for a simple clustering technique to perform as good for this task. This is what we see happening in this example. K means clustering was totally robbed of its glory by the problem (dataset) because datapoints were not clustered. The dataset has more than 3 dimensions so it can't be plotted. However, I will try to use data exatraction technique (if you extend my research time and allow me a few more days to submit this assignment) and bring it to two or three features (if they could describe the variation in between predicted and true values. Your sincerely Ashar

datatalking Goto Github PK

Hi there 👋

Andrew Schell's Projects

Recommend Projects

Recommend Topics

Recommend Org