Andrew Schell's Projects
The Clustergrammer interactive Jupyter notebook widget
Turn your Python and Javascript code into DOT flowcharts
Code and website accompanying Farrell & Lewandowsky's (2017) book
:mortar_board: Path to a free self-taught education in Computer Science!
Learn then Test: Calibrating Predictive Algorithms to Achieve Risk Control
multi-environments application constants python module
Source Codes for "Contrarian Trading Strategies in Python"
What is new, what I've been doing, what you can look at.
The Python programming language
Python solutions for the book Cracking the Coding Interview
Scraper to get data from crunchbase.com and read - write the data using SQLite database and JSON file.
CSV_to_SQL converter SQL server
NVIDIA kepler
Python module for performing basic dense linear algebra computations on the GPU using CUDA.
cuDF - GPU DataFrame Library
Ops School Curriculum
Every model did a job well done and best among them all was Logistic Regression that is because data is mostly lineraly seperable, that is why linear kernel of SVM performed better than radial basis funcion, RBF, kernel (or poly kernel, I gave it poly degree four, however, it could be increased to arbitrary number). An other very good predictor have always been Random Forest that is because though it is random, it takes data from many, many forests and finally it finds something that is good enough, it is quite fast as well, compared to svm with poly kernel (with a high degree of polynomial). Similarly KNN has been good at predicting for classificatoin, like this, where classes are seperable linearly, however, it can also perform well where data is clustered this is beauty of KNN but this was simple classification problem and it did a good job. However Naive Bay did worse than others for this problem, that is because it guesses things based on probability rather than finding a pattern in things. Lastly K means Clustering technique was performed. Though there is similarity between classification and clustering however, they are not quite the same thing, for example sometimes data can be mapped on graph through a circle and it may appear in a circular form, though it would still be single class (classification problem) but it would not be possible for a simple clustering technique to perform as good for this task. This is what we see happening in this example. K means clustering was totally robbed of its glory by the problem (dataset) because datapoints were not clustered. The dataset has more than 3 dimensions so it can't be plotted. However, I will try to use data exatraction technique (if you extend my research time and allow me a few more days to submit this assignment) and bring it to two or three features (if they could describe the variation in between predicted and true values. Your sincerely Ashar
csv import, clean, classify, stem, LDA and analysis