Giter Club home page Giter Club logo

spark-bdp-stockticker-demo's Introduction

Basho Data Platform Spark Stock Ticker Demo

The Basho Data Platform (BDP) integrated NoSQL databases, caching, in-memory analytics, and search components into a single deployment and management stack.

This project is designed to leverage BDP for stock analysis on AWS. It has been tested with BDP 1.0 Enterprise. Please open an Issue if you have any difficulty deploying your own copy after reading the Installation.

Note: This code is for reference purposes only and should not be run in production. This is very unlikely to make you money on the New York Stock Exchange (NYSE).

Quick Links

Overview

This began as an internship at Basho. Read more about this project in this blog post.

The goal of this project was to create an automated trading signal generator use case for the Basho Data Platform. The signal generator would be run using a cluster of AWS EC2 machines that would boot up each night after the markets closed. After the cluster boots up, the Riak KV database is updated with the previous day's market data.

The database is initially filled with approximately 10 years of data on every stock currently trading on the New York Stock Exchange (NYSE) provided by Google Finance API. Once the database is updated, then analysis creates all possible stock pairs (9 million) and runs an Engle-Granger Cointegration test. Stocks that are cointegrated are then filtered by using a standard deviation parameter. All pairs that are cointegrated and currently outside a standard deviation threshold are then written back into the Riak KV database for further analysis by a separate application before being traded. This process is repeated each night at 1am local time.

Installation

In order to run this demo, the following steps must be completed in order.

  1. Create a blank AWS Ubuntu 12.04 AMI
  2. Download and install the Basho Data Platform on the newly created AMI
  3. Clone this repo on the AMI in dir /home/ubuntu/deploy
  4. On the AMI install pip then run sudo pip install -r /path/to/requirements.txt
  5. Save the AMI
  6. Create one (1) EC2 t2.micro with this AMI, this will act as the launcher
  7. Create five (5) EC2 t2.mediums with this AMI, these will act as the compute cluster
  8. From /home/ubuntu/deploy directory on the t2.micro system, run the following to set up BDP cluster: sudo sh setupBDPAWS.sh

This should form the BDP cluster. This only needs to be run once. To confirm the cluster status, run sudo data-platform-admin services on any node in the cluster and you should see:

 $ sudo data-platform-admin services
 Running Services:
 +------------+---------------+--------------+
 |   Group    |    Service    |     Node     |
 +------------+---------------+--------------+
 |spark-master|my-spark-master|[email protected]|
 |spark-master|my-spark-worker|[email protected]|
 |spark-master|my-spark-worker|[email protected]|
 |spark-master|my-spark-worker|[email protected]|
 |spark-master|my-spark-worker|[email protected]|
 |spark-master|my-spark-worker|[email protected]|
 +------------+---------------+--------------+
  1. From the /home/ubuntu/deploy directory on the t2.micro system, run the following command to automate the process described under Overview: python setupCron.py
  2. Populate the database by running the following: python populateData.py

Configuration

  1. This program should not be run on anything less than the recommended number of AWS instances
  2. Users must modify the 'tickers[0:100]' to 'tickers' in riak-spark.py and updateData.py
  3. You need to place your AWS access key and secret key in riak-spark.py, populateData.py, fabfile.py, run.py and updateData.py
  4. You need to ensure correct permissions on AWS for automatic booting and shutdown as well as opening all incoming and outgoing ports in the security group

This completes the setup. You now have a single t2.micro that will boot a sleeping 5 node BDP t2.medium cluster each night at 1am, update the BDP Riak database, run analysis on all stock pairs on the NYSE, write the results back to the BDP Riak database, shutdown the 5 node BDP t2.medium cluster, and wait until tomorrow to do it all over again, ad infintum.

Architecture

Here is what each file does:

  • NYSE.txt: plaintext file that contains stock symbols for the New York Stock Exchange
  • fabfile.py: Fabric library that can be used to create BDP clusters on AWS
  • pair.py: python library containing methods used to boot the BDP cluster, download data, retrieve data from Riak KV, run the analysis, write data to Riak KV, and shut down the cluster
  • populateData.py: run for the initial fill of the BDP Riak database with raw historical stock data
  • requirements.txt: list of python libraries that need to be installed with pip on all machines in this demo
  • riak-spark.py: this file contains the main functionality of the project. It is where Spark analysis is done
  • run.py: run from the client machine to boot the BDP cluster, update data, run analysis, and shut down the cluster
  • setupBDPAWS.sh: script that uses fabfile.py to automatically launch and configure a BDP Spark cluster on AWS
  • setupCron.py: configures cron on the t2.micro system to schedule the 1am run
  • updateData.py: Used to update raw stock data

License and Authors

Maintainers

You can read the full guidelines for bug reporting and code contributions on the Riak Docs. And thank you! Your contribution is incredibly important to us.

Copyright (c) 2014 Basho Technologies, Inc.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

spark-bdp-stockticker-demo's People

Contributors

korry8911 avatar mbbroberg avatar ooshlablu avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.