Giter Club home page Giter Club logo

adjust-data-challenge's Introduction

Overview

This project has two main methods,

WeatherDataLoader:

Loads the data into Postgres. The data will be loaded normalized into database considering all the data will be required and no efficient keys to perform a join during the data processing.

Args: --inputPath - A directory containing files to be loaded

WeatherDataProcessor:

It is a spark job that connects to the database, reads the data and buckets the data into thousands of meters of altitude on column geo_potential_height and writes the bucketed data as partitioned parquet files.

Args: --outputPath - A directory to write the parquet files.

Data Model

create table weather_balloon_data (
  id char(100),
  sounding_date integer,
  hour integer,
  release_time integer,
  number_of_levels integer,
  pressure_source_code char(50),
  non_pressure_source_code char(50),
  latitude integer,
  longitude integer,
  major_level_type integer,
  minor_level_type integer,
  elapsed_time_since_launch integer,
  pressure integer,
  pressure_flag  char(50),
  geo_potential_height integer,
  geo_potential_height_flag  char(50),
  temperature integer,
  temperature_processing_flag   char(50),
  relative_humidity  integer,
  dew_point_depression integer,
  wind_direction integer,
  wind_speed integer
)

CREATE INDEX weather_balloon_idx ON weather_balloon_data (sounding_date);

How to Run

  1. Run Tests
test (from sbt-shell) or sbt "test" (from terminal)
  1. Build the project
clean;assembly (from sbt-shell) or sbt "clean;assembly" (from terminal)
  1. Run the application

Load Weather data:

java -cp target\scala-2.12\adjust-data-challenge.jar com.adjust.data.WeatherDataLoader --inputPath "C:\Users\User\Downloads\USM0
0070219-data"

Process Weather data:

java -cp target\scala-2.12\adjust-data-challenge.jar com.adjust.data.WeatherDataProcessor --outputPath "C:\Users\User\Downloads\USM0
0070219-data-output"

sbt "runMain com.adjust.data.WeatherDataLoader --inputPath C:\Users\User\Downloads\USM00070219-data"

adjust-data-challenge's People

Contributors

prasanna-ds avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.