Giter Club home page Giter Club logo

bayoadejare / lightning-containers Goto Github PK

View Code? Open in Web Editor NEW
6.0 2.0 1.0 71.08 MB

Docker powered starter for geospatial analysis of lightning atmospheric data.

Home Page: https://lightning-containers.streamlit.app/

License: Apache License 2.0

Dockerfile 1.18% Python 37.07% Jupyter Notebook 61.74%
clustering-analysis csv-files data-engineer data-engineering-pipeline data-warehouse databases docker jupyter machine-learning-algorithms noaa-weather

lightning-containers's Introduction

⚡Lightning Containers: docker-powered lightning atmospheric dataset 📈

Buy Me a Coffee at ko-fi.com

Streamlit App

Introduction

This is a monolith Docker image to help you get started with geospatial analysis and visualization of lightning atmospheric data. The data comes from US National Oceanic and Atmospheric Administration (NOAA) Geostationary Lightning Mapper (GLM) - Data Product sourced from AWS s3 buckets. There are currently two main component:

  1. ETL Ingestion - data ingestion and analysis processes.
  2. Streamlit dashboard app - frontend gis visualization dashboard.

Processing done using Pandas dataframes, SQlite with Spatialite extension as the local storage and self-hosted Prefect server instance for orchestration and observability of the processing pipelines.

Technologies used and respective logos
Architecture: Docker + Prefect + Pandas + SQLite + Streamlit

Brief Data Summary Lightning Cluster Filter Algorithm (LCFA)

The multidimensional data structures stored in the netCDF4 files contain a rich variety of 
data including metadata with descriptors. In general, the main variables: flashes, groups, 
events form an hierarchy, i.e. a series of detected radiant events are clustered into groups and groups 
are clustered into flashes using LCFA.

Requirements

Resource Minimum Recommended
CPU 2 cores 4+ cores
RAM 6GB 16GB
Storage 8GB 24GB

Usage

Can be ran with docker containers or installed locally.

docker-compose up # spin up containers

Installation

First make sure, you have the requirements installed, this can be installed from the project directory via pip's setup command:

pip install . # =< python3.12

Start Flow

Run the command to start the prefect workflow orchestration:

prefect server start # Start prefect engine and UI i.e. http://localhost:4200/

The prefect orchestration platform is required to start the scheduling, from the prefect ui, you can run and monitor the data flows.

Run the command to start the data app.

python lightning_containers/flows.py # Start backend

streamlit run app/dashboard.py # Start frontend i.e. http://localhost:8501/

ETL Flow

ETL flow data tasks:

  • Source: extracts NOAA GOES-R GLM file datasets from AWS s3 bucket, default is GOES-18.
  • Transformations: transforms dataset into time series csv.
  • Sink: loads dataset to persistant storage.

Data Ingestion

Ingests the data needed based on specified time window: start and end dates.

Data Processes
  • extract: downloads NOAA GOES-R GLM netCDF4 files from AWS s3 bucket.
  • transform: converts GLM netCDF into time and geo series CSVs.
  • load: loads CSVs to a local backend, persistant SQLite with Spatialite extension.

Clustering Flow

Cluster Analysis

Performs grouping of the ingested data by implementing K-Means clustering algorithm.

Data Tasks
  • preprocessor: prepares the data for cluster model, clean and normalize the data.
  • kmeans_cluster: fits the data to an implementation of k-means cluster algorithm.
  • silhouette_evaluator: evaluates the choice of 'k' clusters by calculating the silhouette coefficient for each k in defined range.
  • elbow_evaluator: evaluates the choice of 'k' clusters by calculating the sum of the squared distance for each k in defined range.

Dashboard Map

An example dashboard of flash event data points
Lightning containers dashboard

Testing

Use the following command to run tests:

pytest

License

Apache 2.0 License

Acknowledgements

This work would not have been possible without amazing open source software and datasets, including but not limited to:

  • GLM Dataset from NOAA NESDIS
  • Prefect from PrefectHQ
  • Built on the codebase of Lightning Streams.

Thank you to the authors of these software and datasets for making them available to the community!

lightning-containers's People

Contributors

bayoadejare avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

faizaanwani

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.