Giter Club home page Giter Club logo

sparks's Introduction

Sparks Networks - Junior Data Engineer task

Techstack used - * Python/Pandas (scripting / Programming) * Airflow (orchestrator) * Postgresql (database) Datawarehouse * Bigquery Datawarehouse * Docker containers * git (code/version management)

See dataflow.png for the High level design of the project.

For the PII info we have used Column level access so that only those info that are required by the final users are provided , they are restricted of any other columns. See file "sparks.sql" for complete info about the tables , roles and other object.

We have dags folder having the sparkdag.py file which is responsible for the ETL process.

Steps to generate and run the DAG.

1> clone the git using git clone https://github.com/Villain1401036/Sparks.git

2> install docker and docker compose
use this link for more info https://docs.docker.com/engine/install/ubuntu/

3> go into the directory /Sparks and run docker compose up

  cd Sparks
  docker compose up 

4> run the postgresinstall.sh in Sparks folder /Sparks/postgresinstall.sh

  sh postgresinstall.sh
  
  This will install the postgres and create the tables in the postgres database 

5> open airflow webserver UI using :8080 eg - 192.168.4.21:8080

6> open connections and add GCP connection as *Connection id - gcp_conn_default *connection type - Google Cloud *Project Id - sparks-363212 *keyfileJson - ( Due to its privacy Please ask me on [email protected] for setup )

7> Run the Sparks DAG in the DAGs in Airflow UI

** Project was made on a Virtual Machine (using Ubuntu as the unix system ) for different system there may be some changes .

sparks's People

Contributors

rahul1401036 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.