Giter Club home page Giter Club logo

nasdaq's Introduction

Market Pulse

  • This project uses sentiment analysis from Twitter tweets to help make predictions on the stock market. Every tweet used is associated to a particular stock symbol when a #(stock symbol) or $(stock symbol) is found. For example, the #SP500 or $SP500 is assumed to be related to the SP 500 stock.

Table Of Contents

Gathering Tweets


Tweets were gathered using the [Tweepy](http://www.tweepy.org/) Python library. Tweets were streamed in real time and saved to a MongoDB database. Anywhere from 4-6 million tweets were gathered per day.

See save_stock_tweets.py for the code.

Streaming Stock Quotes


Both historical and current stock quotes were gathered via the [Yahoo Finance](https://pypi.python.org/pypi/yahoo-finance) Python library.

See yahoo_quotes.py for the code. This includes some data cleaning and preliminary modeling.

First Attempt
My first attempt at getting stock data involved scraping the NASDAQ website in real time for current and historic stock quotes. See scrape_nasdaq.py for the code. I ended up not using this method because it was very time consuming to get quotes. This made it unreasonable considering I wanted to live stream quotes in a web app.

Exploratory Data Analysis

An easy way to get an idea of what your data is doing is to visualize it. For this project I used TFIDF and Nonnegative Matrix Factorization to get an easily interpretable result to graph and model.


So what does this tell me? Well the blue line represents the closing price for a stock symbol for that day and the red lines represent the NMF values for a stock symbol for that day. What I can see from this is that when the red lines go up then the stock market also goes up in the next day. And possibly the same is true for when the market goes down.

See clustering.py for the code.

I can also get an idea of what people are saying about a particular stock symbol by looking at the most used words that relate to it. Enter the word cloud:

Word Cloud for #AAPL or Apple

Word Cloud for #YHOO or Yahoo

Modeling

To start I used a Random Forest Classifier to see if I could simply identify whether the a particular stock symbol would increase or decrease in value in the following day. From this approach I was getting close to %70 accuracy so I decided to move on to creating a Random Forest Regression model. For this approach I was using the RMSE or Root Mean Squared Error, and the MSE or Mean Squared Error to get an idea of where a stock price would close in the next day.


This image shows the closing prices for a weeks worth of data for the TSLA (Tesla) stock symbol. The red box to the right of the graph shows where my model is predicting the market will close for that day. (You will probably notice that two points are missing here.. This is because those dates were on Saturday and Sunday and there will be no closing prices for those days.)

NMF and Regression
When working with Nonnegative Matrix Factorization, or NMF, you need a way to figure what the best number of features to use is. For this I gauged how a certain number of features changed the MSE in the Regression model. That code can be found in model_validation.py. This code is basically my version of Grid Searching a different number of NMF features and different Random Forest metrics.

Web App

Finally I wanted to turn this project into a usable application. To do this I used Flask to create a web application that could allow a user to search different stock symbols, live stream stock quotes, give historical stock data, and display the predictions my model was making for the different stock symbols.

Search Page

Streaming Page

Prediction Page

Conclusion

In the end I believe that using unsupervised learning techniques, like Nonnegative Matrix Factorization, is a great way to fuel supervised learning techniques like Random Forest Regression. I used a lot of new technologies in this project and learned a lot in the process. I hope that this project has shown that I am a capable Data Scientist, Application Developer, and Interface Designer. These are three areas that I greatly enjoy working in.

nasdaq's People

Contributors

gravity226 avatar

Stargazers

George McIntire avatar Benjamin Lupton avatar H Zhang avatar Kevin Hatfield avatar  avatar Sean Sall avatar Erich Wellinger avatar

Watchers

James Cloos avatar  avatar Erich Wellinger avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.