Giter Club home page Giter Club logo

beatingthemarket's Introduction

beatingthemarket

DATA

For the Artificial Neural Network and Support Vector Machine models, use PSEI.xlsx (open, high, low, and close prices) as input. For Sentiment Analysis, use pseiandsentiment.xlsx (sentiment scores and close prices). For Artificial Neural Network - Sentiment Analysis and Support Vector Machine - Sentiment Analysis, use sentimentandprices2.xlsx (sentiment scores, open, high, low, and close prices).

From hereon, the sentiment scores (see allscores.xlsx) refer to the scores resulting from running Sentiment Analysis.R using five years' worth of daily news articles from the Inquirer.net website (data excluded due to size). These news articles are collected using Text Mining.R. After collection, these news articles are tokenized so the stop words (using the stop_words by R that combines snowball, SMART, and onyx) can be removed. Once the tokenized dataset is clean, the sentiment lexicon AFINN-111 is used to give a corresponding score for each word. These scores are aggregated so that there is only one score per day. This score is referred to as the sentiment score.

MODELS

The ANN and ANN-SA models use resilient backpropagation with weight backtracking. The default error function (sum of squared errors) and the default activation function (logistic function) are used. Since the model uses rprop, there is no need to set a learning rate. The threshold is set at 0.01. The ANN model has one input layer with four input neurons, two hidden layers with 50 neurons each, and one output layer with one output neuron. The ANN-SA model differs only by having five input neurons in the input layer, but the hidden and output layers are the same as in the ANN model. To avoid overfitting, the models are trained a maximum of ten times only.

The SVM and SVM-SA models are of eps-regression type. The default values for C (1) and epsilon (0.1) are used. For the kernel function, the default Gaussian radial basis kernel method is used.

The SA model uses the Random Forest algorithm to train using the sentiment scores and the close prices.

PERFORMANCE MEASURES

After running the five models, the output must be the same as in ALL.xlsx. These results can be evaluated using the performance measures (R-squared, Mean Squared Error, and Directional Accuracy).

TRADING STRATEGY

The development of the trading strategy does not involve machine learning techniques, but the algorithm works as follows:

  1. The actual price at day i is compared to the predicted price at day i + 1.
  2. If the actual price at day i was less than the predicted price at day i+1, the decision was to buy if the investor is not holding the stock yet. The actual price of the stock at day i was then deducted from the seed money that the investor is holding. In case the investor is already holding the stock, the decision was to hold onto the stock until the decision makes a turning point (i.e., the decision became to sell), causing no change in the seed money.
  3. If the actual price at day i was greater than the predicted price at day i+1, the decision was to sell if the investor is already holding the stock. The actual price of the stock at day i was then added to the seed money that the investor is holding. In case the investor is not holding the stock, the decision was to do nothing, and the seed money remained the same.

This algorithm makes use of the following assumptions:

  1. The actual daily closing price was used as the selling and buying price when orders are placed. Orders (buying and selling) could only be placed once per day.
  2. The strategy ignored taxes, trading commissions, and other fees related to trading to simplify computations for the returns.
  3. Buying and selling a stock referred to buying and selling stocks from any of the companies included in the PSEi. The PSEi was considered as the aggregate price of the stocks listed in the index to simplify calculations.
  4. The trading began with the investor having PHP 10,000 as initial “seed money” which can be used to buy an indexed stock when the prediction shows that the stock will increase in value. Since the actual prices of the PSEi for the trading strategy ranges from 7,000 to 8,000 and the seed money is only 10,000, the strategy was limited to buying and selling only one stock at a time.
  5. The trading strategy gives only four decisions: "Buy", "Sell", "Hold", and "Do Nothing".

To evaluate the trading strategy, the annualized return (computed as the mean of the logarithmic returns multiplied by the number of trading days) is used.

RESULTS

Based on the results, ANN is better than SVM. Combining ANN with SA improves the stand-alone ANN (as seen in the ANN-SA), but combining SVM with SA does not have an improving effect on the stand-alone SVM (as seen in the SVM-SA). For the trading strategy, the ANN model achieves the highest return out of the five, which means that the magnitude of the returns depend heavily on the accuracy of the model's predictions.

beatingthemarket's People

Contributors

noemimejia avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

beatingthemarket's Issues

Did you choose this approach with a stock that has accurate results?

Hi,

I have been using your code especially the sentiment portion and found it very insightful.

As for the last portion where you predict using neural networks with sentiment, I am not seeing the results to be that accurate.

In your case, did you test this on multiple tickers to see which one was the most accurate before choosing the ticker in your example based on this model?

I also broke out the sentiment by category such as technology, usa, business, etc.

Overall very helpful though!

New complementary tool

My name is Luis, I'm a big-data machine-learning developer, I'm a fan of your work, and I usually check your updates.

I was afraid that my savings would be eaten by inflation. I have created a powerful tool that based on past technical patterns (volatility, moving averages, statistics, trends, candlesticks, support and resistance, stock index indicators).
All the ones you know (RSI, MACD, STOCH, Bolinger Bands, SMA, DEMARK, Japanese candlesticks, ichimoku, fibonacci, williansR, balance of power, murrey math, etc) and more than 200 others.

The tool creates prediction models of correct trading points (buy signal and sell signal, every stock is good traded in time and direction).
For this I have used big data tools like pandas python, stock market libraries like: tablib, TAcharts ,pandas_ta... For data collection and calculation.
And powerful machine-learning libraries such as: Sklearn.RandomForest , Sklearn.GradientBoosting, XGBoost, Google TensorFlow and Google TensorFlow LSTM.

With the models trained with the selection of the best technical indicators, the tool is able to predict trading points (where to buy, where to sell) and send real-time alerts to Telegram or Mail. The points are calculated based on the learning of the correct trading points of the last 2 years (including the change to bear market after the rate hike).

I think it could be useful to you, to improve, I would like to share it with you, and if you are interested in improving and collaborating I am also willing, and if not file it in the box.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.