Giter Club home page Giter Club logo

cityxr-data-engine's Introduction

City XR Data Project

We’ll develop a data engine package to handle ingestion and querying of data. This will consist of two programs and supporting scripts:

  • The cxd-ingest program will run and continuously sample data according to its source plugins.
  • The cxd-server program will run and serve queries over a some kind of GraphQL or RESTy interface for batch queries and also route real-time messages to data set subscribers via websockets.

Assumptions

  • The engine will use Postgres 11 as a backing data store.
  • The engine will target deployment on Linux servers.
  • Operators will have basic business analyst knowledge--e.g., ability to modify example stub scripts.

Data sources supported will be:

  • SQL queries via ODBC
  • Document fetching via HTTP
  • JSON payloads received via HTTP POST
  • Local filesystem access

Datasets will be one of:

  • Point-in-time data (“These businesses are at these locations 12 days ago.”)
  • Time-series data (“These locations had this temperature between May 5 and Nov 5.”)
  • Instantaneous data (“This is the current value of water in this flood plain.”)

Features

The overall engine features will:

  • Thorough documentation of engine architecture.
  • Tutorials for common engine use-cases.
  • Helper scripts to generate stub data source plugins.
  • Helper scripts to manage the engine.
  • Instructions for deployment.

The cxd-ingest program will:

  • Have a shell script for generating stub plugins to add data sources.
  • Have a plugin architecture for supporting data sources as described above.
  • Plugins will be written in Python 3.
  • Plugins will register themselves with the ingest program and run their setup (creation of database tables, etc.).
  • Plugins may either stay resident (if they need to receive webhooks) or schedule themselves for periodic servicing.
  • When serviced, plugins will run in two parts:
    • A fetch phase, where they receive or gather their data.
    • An ingest phase, where they process and clean the data.
  • Log errors and data quality conditions.

The cxd-server program will:

  • Provide at least one of a RESTy or GraphQL interface for querying ingested data.
  • Provide ability to retrieve batched data updates since some previous time (“Give me all new data since yesterday”).
  • Provide periodic updates of data updates to clients subscribed to various datasets.
  • Provide a maintenance/debugging interface to see what data sources are available, what data they have, what their run status is, and to visualize that data.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.