Giter Club home page Giter Club logo

ngods-stocks's Introduction

ngods stock market demo

This repository contains a stock market analysis demo of the ngods data stack. The demo performs the following steps:

  1. Download selected stock symbols data from Yahoo Finance API.
  2. Store the stock data in ngods data warehouse (using Iceberg format).
  3. Transform the data (e.g. normalize stock prices) using dbt.
  4. Expose analytics data model using cube.dev.
  5. Visualize data as reports and dashboards using Metabase.
  6. Predicts stock prices using ARIMA in Apache Spark.

The demo is packaged as docker-compose script that downloads, installs, and runs all components of the data stack.

UPDATES

  • 2023-02-03:
    • Upgrade to Apache Iceberg 1.1.0
    • Upgrade to Trino 406
    • Migrated to the new JDBC catalog (removed the heavyweigt Hive Metastore)

ngods

ngods stands for New Generation Opensource Data Stack. It includes the following components:

ngods components

ngods is open-sourced under a BSD license and it is distributed as a docker-compose script that supports Intel and ARM architectures.

Running the demo

ngods requires a machine with at least 16GB RAM and Intel or Arm 64 CPU running Docker. It requires docker-compose.

  1. Clone the ngods repo
git clone https://github.com/zsvoboda/ngods-stocks.git
  1. Start the data stack with the docker-compose up command
cd ngods-stocks

docker-compose up -d

NOTE: This can take quite long depending on your network speed.

  1. Stop the data stack via the docker-compose down command
docker-compose down
  1. Execute the data pipeline from the Dagster console at http://localhost:3070/ with this yaml config file.

Dagster e2e

Cut and paste the content of the e2e.yaml file to this Dagster UI console page and start the data pipeline by clicking the Launch Run button.

NOTE: You can customize the list of stock symbols that will be downloaded.

  1. Review and customize the cube.dev metrics, and dimensions. Test these metrics in the cube.dev playground.

cube.dev playground

See the cube.dev documentation for more information.

  1. Check out the Metabase data visualizations that is connected to the cube.dev analytical model. You can run SQL queries on top of the cube.dev schema.

Use username [email protected] and password metabase1.

Metabase

You can create your own data visualizations and dashboards. See the Metabase documentation for more information.

  1. Predict stock close price. Run the ARIMA time-series prediction model notebook that is trained on 29 months of the Apple:AAPL stock data and predicts the next month.

Jupyter ARIMA

  1. Download DBeaver SQL tool.

  2. Connect to the Postgres database that contains the gold stage data. Use jdbc:postgresql://localhost:5432/ngods JDBC URL with username ngods and password ngods.

Postgres JDBC connection

  1. Connect to the Trino database that has access to all data stages (bronze, silver, and gold schemas of the warehouse database). Use jdbc:trino://localhost:8060 JDBC URL with username trino and password trino.

Trino JDBC connection

Trino schemas

  1. Connect to the Spark database that is used for data transformations. Use jdbc:hive2://localhost:10009 JDBC URL with no username and password.

Spark JDBC connection

Customizing the demo

This chapter contains useful information for customizing the demo.

ngods directories

Here are few distribution's directories that you may need to customize:

  • conf configuration of all data stack components
    • cube cube.dev schema (semantic model definition)
  • data main data directory
    • minio root data directory (contains buckets and file data)
    • spark Jupyter notebooks
    • stage file stage data. Spark can access this directory via /var/lib/ngods/stage path.
  • projects dbt, Dagster, and DataHub projects
    • dagster Dagster orchestration project
    • dbt dbt transformations (one project per each medallion stage: bronze, silver, and gold)

ngods endpoints

The data stack has the following endpoints

ngods databases: Spark, Trino, and Postgres

ngods stack includes three database engines: Spark, Trino, and Postgres. Both Spark and Trino have access to Iceberg tables in warehouse.bronze and warehouse.silver schemas. Trino engine can also access the analytics.gold schema in Postgres. Trino can federate queries between the Postgres and Iceberg tables.

The Spark engine is configured for ELT and pyspark data transformations.

Spark

The Trino engine is configured for data federation between the Iceberg and Postgres tables. Additional catalogs can be configured as needed.

Trino

The Postgres database has accesses only to the analytics.gold schema and it is used for executing analytical queries over the gold data.

Demo data pipeline

The demo data pipeline is utilizes the medallion architecture with bronze, silver, and gold data stages.

data pipeline

and consists of the following phases:

  1. Data are downloaded from Yahoo Finance REST API to the local Minio bucket (./data/stage) using this Dagster operation.
  2. The downloaded CSV file is loaded to the bronze stage Iceberg tables (warehouse.bronze Spark schema) using dbt models that are executed in Spark (./projects/dbt/bronze).
  3. Silver stage Iceberg tables (warehouse.silver Spark schema) are created using dbt models that are executed in Spark (./projects/dbt/silver).
  4. Gold stage Postgres tables (analytics.gold Trino schema) are created using dbt models that are executed in Trino (./projects/dbt/gold).

DBT models

All data pipeline phases are orchestrated by Dagster framework. Dagster operations, resources and jobs are defined in the Dagster project.

Dagster console

The pipeline is executed by running the e2e job from the Dagster console at http://localhost:3070/ using this yaml config file

ngods analytics layer

ngods includes cube.dev for semantic data model and Metabase for self-service analytics (dashboards, reports, and visualizations).

Analytics

Analytical (semantic) model is defined in cube.dev and is used for executing analytical queries over the gold data.

cube.dev

Metabase is connected to the cube.dev via SQL API. End users can use it for self-service creation of dashboards, reports, and data visualizations. Metabase is also directly connected to the gold schema in the Postgres database.

Metabase

ngods machine learning

Jupyter Notebooks with Scala, Java and Python backends can be used for machine learning.

Jupyter

Support

Create a github issue if you have any questions.

ngods-stocks's People

Contributors

mspronk avatar zsvoboda avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ngods-stocks's Issues

exec /usr/bin/entrypoint.sh: no such file or directory

Hi,

It might be rookie me, but I can't seem to get rid of the error:

exec /usr/bin/entrypoint.sh: no such file or directory

coming from the aio container. I am running on windows, I don't know if this is what causes an error in the mapping, or if it is a general issue.

requirements.txt - Python dependency resolution stops with an error

365.9 WARNING: jsonschema 4.1.0 does not provide the extra 'format-nongpl'
365.9 WARNING: jsonschema 4.0.1 does not provide the extra 'format-nongpl'
365.9 WARNING: jsonschema 3.2.0 does not provide the extra 'format-nongpl'
366.0 WARNING: jsonschema 4.5.1 does not provide the extra 'format-nongpl'
366.0 WARNING: jsonschema 4.5.0 does not provide the extra 'format-nongpl'
366.0 WARNING: jsonschema 4.4.0 does not provide the extra 'format-nongpl'
366.0 WARNING: jsonschema 4.3.3 does not provide the extra 'format-nongpl'
366.0 WARNING: jsonschema 4.3.2 does not provide the extra 'format-nongpl'
366.0 WARNING: jsonschema 4.3.1 does not provide the extra 'format-nongpl'
366.0 WARNING: jsonschema 4.3.0 does not provide the extra 'format-nongpl'
366.0 WARNING: jsonschema 4.2.1 does not provide the extra 'format-nongpl'
366.0 WARNING: jsonschema 4.2.0 does not provide the extra 'format-nongpl'
366.0 WARNING: jsonschema 4.1.2 does not provide the extra 'format-nongpl'
366.0 WARNING: jsonschema 4.1.1 does not provide the extra 'format-nongpl'
366.0 WARNING: jsonschema 4.1.0 does not provide the extra 'format-nongpl'
366.0 WARNING: jsonschema 4.0.1 does not provide the extra 'format-nongpl'
366.0 WARNING: jsonschema 3.2.0 does not provide the extra 'format-nongpl'
366.2 WARNING: jsonschema 4.5.1 does not provide the extra 'format-nongpl'
366.2 WARNING: jsonschema 4.5.0 does not provide the extra 'format-nongpl'
366.2 WARNING: jsonschema 4.4.0 does not provide the extra 'format-nongpl'
366.2 WARNING: jsonschema 4.3.3 does not provide the extra 'format-nongpl'
366.2 WARNING: jsonschema 4.3.2 does not provide the extra 'format-nongpl'
366.2 WARNING: jsonschema 4.3.1 does not provide the extra 'format-nongpl'
366.2 WARNING: jsonschema 4.3.0 does not provide the extra 'format-nongpl'
366.2 WARNING: jsonschema 4.2.1 does not provide the extra 'format-nongpl'
366.2 WARNING: jsonschema 4.2.0 does not provide the extra 'format-nongpl'
366.2 WARNING: jsonschema 4.1.2 does not provide the extra 'format-nongpl'
366.2 WARNING: jsonschema 4.1.1 does not provide the extra 'format-nongpl'
366.2 WARNING: jsonschema 4.1.0 does not provide the extra 'format-nongpl'
366.2 WARNING: jsonschema 4.0.1 does not provide the extra 'format-nongpl'
366.2 WARNING: jsonschema 3.2.0 does not provide the extra 'format-nongpl'
366.3 WARNING: jsonschema 4.5.1 does not provide the extra 'format-nongpl'
366.3 WARNING: jsonschema 4.5.0 does not provide the extra 'format-nongpl'
366.3 WARNING: jsonschema 4.4.0 does not provide the extra 'format-nongpl'
366.3 WARNING: jsonschema 4.3.3 does not provide the extra 'format-nongpl'
366.3 WARNING: jsonschema 4.3.2 does not provide the extra 'format-nongpl'
366.3 WARNING: jsonschema 4.3.1 does not provide the extra 'format-nongpl'
366.3 WARNING: jsonschema 4.3.0 does not provide the extra 'format-nongpl'
366.4 INFO: pip is looking at multiple versions of pickleshare to determine which version is compatible with other requirements. This could take a while.
366.4 Downloading pickleshare-0.4.tar.gz (11 kB)
366.4 Preparing metadata (setup.py): started
366.6 Preparing metadata (setup.py): finished with status 'error'
366.6 error: subprocess-exited-with-error
366.6
366.6 × python setup.py egg_info did not run successfully.
366.6 │ exit code: 1
366.6 ╰─> [11 lines of output]
366.6 Traceback (most recent call last):
366.6 File "", line 2, in
366.6 File "", line 34, in
366.6 File "/tmp/pip-install-40xfnxrc/pickleshare_118f0a7d8a57465088b38d568a6c39b0/setup.py", line 3, in
366.6 import pickleshare
366.6 File "/tmp/pip-install-40xfnxrc/pickleshare_118f0a7d8a57465088b38d568a6c39b0/pickleshare.py", line 41, in
366.6 from path import path as Path
366.6 File "/tmp/pip-install-40xfnxrc/pickleshare_118f0a7d8a57465088b38d568a6c39b0/path.py", line 724
366.6 def mkdir(self, mode=0777):
366.6 ^
366.6 SyntaxError: leading zeros in decimal integer literals are not permitted; use an 0o prefix for octal integers
366.6 [end of output]
366.6
366.6 note: This error originates from a subprocess, and is likely not a problem with pip.
366.6 error: metadata-generation-failed
366.6
366.6 × Encountered error while generating package metadata.
366.6 ╰─> See above for output.
366.6
366.6 note: This is an issue with the package mentioned above, not pip.
366.6 hint: See above for details.
366.7
366.7 [notice] A new release of pip is available: 23.0.1 -> 23.2.1
366.7 [notice] To update, run: pip install --upgrade pip

failed to solve: process "/bin/sh -c pip3 install --no-cache-dir -r requirements.txt && rm requirements.txt" did not complete successfully: exit code: 1

Custom CSV

Hi! I was trying to add a custom CSV file, but I wasn't able to do so.

Is there a way to add custom ones using the base you have?

Thank you so much in advantage, I'll appreciate so much every everything you could tell me!

Trino Error

First, awesome example.

Second, hopefully this is a silly issue which goes away easily.

Whenever I try to start the Trino container (e.g. docker-compose up)...it immediately shuts down with the following error:
'ERROR: Trino requires at least 4096 file descriptors (found 1024)'

Have you dealt with this before?

Error starting containers

Cloned the repo and while building,

#0 173.5 INFO: pip is looking at multiple versions of pickleshare to determine which version is compatible with other requirements. This could take a while.
#0 174.0   Downloading pickleshare-0.4.tar.gz (11 kB)
#0 174.1   Preparing metadata (setup.py): started
#0 174.2   Preparing metadata (setup.py): finished with status 'error'
#0 174.2   error: subprocess-exited-with-error
#0 174.2   
#0 174.2   × python setup.py egg_info did not run successfully.
#0 174.2   │ exit code: 1
#0 174.2   ╰─> [11 lines of output]
#0 174.2       Traceback (most recent call last):
#0 174.2         File "<string>", line 2, in <module>
#0 174.2         File "<pip-setuptools-caller>", line 34, in <module>
#0 174.2         File "/tmp/pip-install-9s9g_51u/pickleshare_7951e09ae637468caa6efddd2108a27f/setup.py", line 3, in <module>
#0 174.2           import pickleshare
#0 174.2         File "/tmp/pip-install-9s9g_51u/pickleshare_7951e09ae637468caa6efddd2108a27f/pickleshare.py", line 41, in <module>
#0 174.2           from path import path as Path
#0 174.2         File "/tmp/pip-install-9s9g_51u/pickleshare_7951e09ae637468caa6efddd2108a27f/path.py", line 724
#0 174.2           def mkdir(self, mode=0777):
#0 174.2                                   ^
#0 174.2       SyntaxError: leading zeros in decimal integer literals are not permitted; use an 0o prefix for octal integers
#0 174.2       [end of output]
#0 174.2   ```

open source questions

hi,

is there an alternative for cube dev?

also, what would you recommend in terms of upgrades if ever this would need to work with real time use cases ?

appreciate this btw! for reference i'm trying to build learning material for data engineers and a modern data stack that's open source is what is lacking !
Myk

Kyuubi-1.6.0 not supported

./aio/Dockerfile on line 89: Kyuubi 1.6.0 should be changed to Kyuubi 1.6.1 since the old version is not downloadable anymore.

Datahub wont start in the docker compose setup

Thank you very much for your hard work!
I know it is challenging to tame all the different software packages.

The standard docker-compose.yml works like a charm but with the .x86 I have the problem that I cannot get the datahub-gms container as well as elasticsearch to stay in healthy state.

I use a 8vcpu and 64 GiB machine on DigitalOcean to tinker with the setup for myself.
Is there maybe a trick to get Datahub working? This is next to Iceberg the thing I would really like to test :)

Thank you very much!

can not run sql in cube via jdbc

Hi, this project works fine,
but when I try to connect cube via dbeaver ( with jdbc:postgresql://localhost:3245/cube JDBC URL (username cube / password cube))
connect ok,
but we can only see database cube and schema public. no table/view can be use

Metabase stuck in a loop

Hello!

I'm trying to start the system via docker compose up . All pieces come up nice, except for Metabase which seems stuck in a loop with this error message:

metabase  | 2023-04-19 07:14:05,812 INFO db.update-h2 :: H2 v1 database detected, updating...
metabase  | 2023-04-19 07:14:05,812 INFO db.update-h2 :: Creating v1 database backup at /tmp/metabase-migrate-h2-db-v1-v2.sql
metabase  | 2023-04-19 07:14:06,340 INFO db.update-h2 :: Moving old app database to /conf/metabase.db.v1-backup.mv.db
metabase  | 2023-04-19 07:14:06,341 ERROR db.update-h2 :: Failed to update H2 database: #error {
metabase  |  :cause /conf/metabase.db.mv.db -> /conf/metabase.db.v1-backup.mv.db
metabase  |  :via
metabase  |  [{:type java.nio.file.AccessDeniedException
metabase  |    :message /conf/metabase.db.mv.db -> /conf/metabase.db.v1-backup.mv.db
metabase  |    :at [sun.nio.fs.UnixException translateToIOException UnixException.java 90]}]
metabase  |  :trace
metabase  |  [[sun.nio.fs.UnixException translateToIOException UnixException.java 90]
metabase  |   [sun.nio.fs.UnixException rethrowAsIOException UnixException.java 106]
metabase  |   [sun.nio.fs.UnixCopyFile move UnixCopyFile.java 481]
metabase  |   [sun.nio.fs.UnixFileSystemProvider move UnixFileSystemProvider.java 266]
metabase  |   [java.nio.file.Files move Files.java 1430]
metabase  |   [metabase.db.update_h2$update_BANG_ invokeStatic update_h2.clj 81]
metabase  |   [metabase.db.update_h2$update_BANG_ invoke update_h2.clj 68]
metabase  |   [metabase.db.update_h2$update_if_needed invokeStatic update_h2.clj 98]
metabase  |   [metabase.db.update_h2$update_if_needed invoke update_h2.clj 90]
metabase  |   [metabase.db.data_source.DataSource getConnection data_source.clj 29]
metabase  |   [com.mchange.v2.c3p0.WrapperConnectionPoolDataSource getPooledConnection WrapperConnectionPoolDataSource.java 161]
metabase  |   [com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool$1PooledConnectionResourcePoolManager acquireResource C3P0PooledConnectionPool.java 213]
metabase  |   [com.mchange.v2.resourcepool.BasicResourcePool doAcquire BasicResourcePool.java 1176]
metabase  |   [com.mchange.v2.resourcepool.BasicResourcePool doAcquireAndDecrementPendingAcquiresWithinLockOnSuccess BasicResourcePool.java 1163]
metabase  |   [com.mchange.v2.resourcepool.BasicResourcePool access$700 BasicResourcePool.java 44]
metabase  |   [com.mchange.v2.resourcepool.BasicResourcePool$ScatteredAcquireTask run BasicResourcePool.java 1908]
metabase  |   [com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread run ThreadPoolAsynchronousRunner.java 696]]}

I assume it's a matter of permissions, because the H2 DB file resides on the local drive, although I tried with chmod 777 on data, and it's the same. Could you please share the version of the Metabase container that you used for base in building ?

Thank you!

new

My name is Luis, I'm a big-data machine-learning developer, I'm a fan of your work, and I usually check your updates.

I was afraid that my savings would be eaten by inflation. I have created a powerful tool that based on past technical patterns (volatility, moving averages, statistics, trends, candlesticks, support and resistance, stock index indicators).
All the ones you know (RSI, MACD, STOCH, Bolinger Bands, SMA, DEMARK, Japanese candlesticks, ichimoku, fibonacci, williansR, balance of power, murrey math, etc) and more than 200 others.

The tool creates prediction models of correct trading points (buy signal and sell signal, every stock is good traded in time and direction).
For this I have used big data tools like pandas python, stock market libraries like: tablib, TAcharts ,pandas_ta... For data collection and calculation.
And powerful machine-learning libraries such as: Sklearn.RandomForest , Sklearn.GradientBoosting, XGBoost, Google TensorFlow and Google TensorFlow LSTM.

With the models trained with the selection of the best technical indicators, the tool is able to predict trading points (where to buy, where to sell) and send real-time alerts to Telegram or Mail. The points are calculated based on the learning of the correct trading points of the last 2 years (including the change to bear market after the rate hike).

I think it could be useful to you, to improve, I would like to share it with you, and if you are interested in improving and collaborating I am also willing, and if not file it in the box.

If tou want, Please read the readme , and in case of any problem you can contact me ,
If you are convinced try to install it with the documentation.
https://github.com/Leci37/LecTrade/tree/develop I appreciate the feedback

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.