Giter Club home page Giter Club logo

radar-schemas's Introduction

RADAR-Schemas

Avro schemas used in RADAR-base. The schemas are organized as follows:

  • The commons directory contains all schemas used inside Kafka and data fed into Kafka.
    • In the active subdirectory, add schemas for active data collection, like questionnaires or assignments.
    • In the catalogue subdirectory, modify schemas for cataloguing data types.
    • In the kafka subdirectory, add schemas used throughtout Kafka, like record keys.
    • In the monitor subdirectory, add schemas for monitoring applications that gather data.
    • In the passive subdirectory, add schemas for passive data collection, like wearables.
    • In the stream subdirectory, add schemas used in Kafka Streams.
  • The specifications directory contains specifications of what data types are collected through which devices.
  • Java SDKs for each of the components are provided in the java-sdk folder, see installation instructions there. They are automatically generated from the Avro schemas using the Avro 1.8.2 specification.

Usage

This project can be used in RADAR-base by using the radarbase/kafka-init Docker image. The schemas and specifications can be extended by locally creating a directory structure that includes a commons and specifications directory and mounting it to the image, to the /schema/conf/commons and /schema/conf/specifications directories, respectively. You can provide a file path in CONFIG_YAML that points to a config.yaml file that is mounted in the docker container. The config file has the following format:

# Specify any Kafka properties needed to connect to the Kafka cluster
kafka:
  security.protocol: PLAINTEXT

# Configure additional topics, or specify the properties of a topic listed elsewhere.
topics:
  my_custom_topic:
    # Enable configuration of this topic
    enabled: true
    # Number of partitions to use, if newly created
    partitions: 3
    # Replication factor, if newly created
    replicationFactor: 2
    # Topic properties in Kafka. See
    # <https://docs.confluent.io/platform/current/installation/configuration/topic-configs.html#ak-topic-configurations-for-cp>
    properties:
      cleanup.policy: compact
    # Key schema for the topic
    keySchema: my.key.Schema
    # Value schema for the topic
    valueSchema: my.value.Schema
    # Whether to register the schemas to the Schema Registry
    registerSchema: false

# Schema configuration. This refers to the files in the commons directory.
schemas:
  # Only include given schema directory files. You can use File glob syntax as described in <https://docs.oracle.com/javase/8/docs/api/java/nio/file/FileSystem.html#getPathMatcher-java.lang.String->
  # If include is specified, exclude will be ignored. The glob pattern should start from the commons directory.
  include: []
  # Exclude all given schema directory files. You can use File glob syntax as described in <https://docs.oracle.com/javase/8/docs/api/java/nio/file/FileSystem.html#getPathMatcher-java.lang.String->
  # If include is specified, exclude will be ignored. The glob pattern should start from the commons directory.
  exclude:
    - active/**
  # You can specify additional schemas, using the format for each respective specification directory.
  monitor:
    # The object name is the path it would have, the value is a plain string containing a JSON object
    application/application_uptime2.avsc: >
      {
        "namespace": "org.radarcns.monitor.application",
        "type": "record",
        "name": "ApplicationUptime2",
        "doc": "Length of application uptime.",
        "fields": [
          { "name": "time", "type": "double", "doc": "Device timestamp in UTC (s)." },
          { "name": "uptime", "type": "double", "doc": "Time since last app start (s)." }
        ]
      }

# Source configuration. This refers to the files in the specifications directory.
sources:
  # Only include given specification directory files. You can use File glob syntax as described in <https://docs.oracle.com/javase/8/docs/api/java/nio/file/FileSystem.html#getPathMatcher-java.lang.String->
  # If include is specified, exclude will be ignored. The glob pattern should start from the specifications directory.
  include:
    - passive/*
  # Exclude all given specification directory files. You can use File glob syntax as described in <https://docs.oracle.com/javase/8/docs/api/java/nio/file/FileSystem.html#getPathMatcher-java.lang.String->
  # If include is specified, exclude will be ignored. The glob pattern should start from the specifications directory.
  exclude: []
  # You can specify additional sources, using the format for each respective specification directory.
  monitor:
    - vender: test
      model: test
      version: 1.0.0
      data:
        type: UPTIME
        topic: application_uptime2
        value_schema: .monitor.application.ApplicationUptime2

Please see the inline comments for more information on their values.

Contributing

The Avro schemas should follow the Google JSON style guide.

In addition, schemas in the commons directory should follow the following guidelines:

  • Try to avoid abbreviations in the field names and write out the field name instead.
  • There should be no need to add value at the end of a field name.
  • Enumerator items should be written in uppercase characters separated by underscores.
  • Add documentation (the doc property) to each schema, each field, and each enum. The documentation should show in text what is being measured, how, and what units or ranges are applicable. Abbreviations and acronyms in the documentation should be written out. Each doc property should start with a capital and end with a period.
  • Prefer a categorical specification (an Avro enum) over a free string if that categorization is expected to remain very stable. This disambiguates the possible values for analysis. If a field is expected to be extended outside this project or very often within this project, use a free string instead.
  • Prefer a flat record over a hierarchical record. This simplifies the organization of the data downstream, for example, when mapping to CSV.
  • Prefer written out fields to arrays. This simplifies the organization of the data downstream, for example, when mapping to CSV.
  • Give each schema a proper namespace, preferably org.radarcns.passive.<vendor> fully in lowercase, without any numbers, uppercase letters or symbols (except .). For the Empatica E4, the vendor is Empatica, so the namespace is org.radarcns.passive.empatica. For generic types, like a phone, Android Wear device or Android application, the namespace could just be org.radarcns.passive.phone, org.radarcns.passive.wear, or org.radarcns.monitor.application.
  • In the schema name, use upper camel case and name the device explicitly (for example, EmpaticaE4Temperature).
  • For fields that are inherent to a record, and will never be removed or renamed, no default value is needed. For all other fields:
    • if the type is an enum, use an UNKNOWN symbol as default value
    • otherwise, set the type to a union of ["null", <intended type>] and set the default value to null.

Validation phase

Avro schemas are automatically validated against RADAR-base guide lines while building. For more details, check catalog validator.

Test setup

The RADAR schema tools can be tested locally using Docker. To run the tools, first install Docker. Then run

docker-compose build
docker-compose up -d zookeeper-1 kafka-1 schema-registry-1

Now you can run tools commands with

# usage
docker-compose run --rm tools
# validation
docker-compose run --rm tools radar-schemas-tools validate /schema/merged
# list topic information
docker-compose run --rm tools radar-schemas-tools list /schema/merged
# register schemas with the schema registry
docker-compose run --rm tools radar-schemas-tools register http://schema-registry-1:8081 /schema/merged
# create topics with zookeeper
docker-compose run --rm tools radar-schemas-tools create -s kafka-1:9092 -b 1 -r 1 -p 1 /schema/merged
# run source-catalogue webservice
docker-compose run -p 8080:8080 --rm tools radar-catalog-server -p 8080 /schema/merged
# and in a separate console, run
curl localhost:8080/source-types
# back up the _schemas topic
docker-compose run --rm tools radar-schemas-tools schema-topic --backup -f schema.json -b 1 -s kafka-1:9092 -f /schema/conf/backup.json /schema/merged
# ensure the validity of the _schemas topic
docker-compose run --rm tools radar-schemas-tools schema-topic --ensure -f schema.json -b 1 -s kafka-1:9092 -f /schema/conf/backup.json -r 1 /schema/merged

Using radar-schema-tools with Confluent Cloud

  1. Create topics on Confluent Cloud

    1.1. Create a config.yaml file. A Confluent Cloud config for Java application based on this template.

    kafka:
      bootstrap.servers: {{ BROKER_ENDPOINT }}
      security.protocol: SASL_SSL
      sasl.jaas.config: org.apache.kafka.common.security.plain.PlainLoginModule required username="{{ CLUSTER_API_KEY }}" password="{{ CLUSTER_API_SECRET }}";
      ssl.endpoint.identification.algorithm: https
      sasl.mechanism: PLAIN

    1.2. Run topic-create command

    docker run --rm -v "$PWD/config.yaml:/etc/radar-schemas-tools/config.yaml" radarbase/radar-schemas-tools radar-schemas-tools topic-create -c /etc/radar-schemas-tools/config.yaml /schema/merged
    
  2. Register schemas on Confluent Cloud schema registry

    docker run --rm radarbase/kafka-init radar-schemas-tools register SR_ENDPOINT -u SR_API_KEY -p SR_API_SECRET /schema/merged
    

radar-schemas's People

Contributors

blootsvoets avatar yatharthranjan avatar mpgxvii avatar nivemaham avatar fnobilia avatar volpym avatar farazkhalidamani avatar lamprosm avatar sophiecpeters avatar sboettcher avatar vishalkarnik avatar lalitagarwal avatar kerzmaximilian avatar jorgecid avatar abhiglobalistic avatar baixiac avatar jzhou59 avatar davemuggeridge avatar afolarin avatar mbegalevibrent avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.