Overview

This project has two main methods,

WeatherDataLoader:

Loads the data into Postgres. The data will be loaded normalized into database considering all the data will be required and no efficient keys to perform a join during the data processing.

Args: --inputPath - A directory containing files to be loaded

WeatherDataProcessor:

It is a spark job that connects to the database, reads the data and buckets the data into thousands of meters of altitude on column geo_potential_height and writes the bucketed data as partitioned parquet files.

Args: --outputPath - A directory to write the parquet files.

Data Model

create table weather_balloon_data (
  id char(100),
  sounding_date integer,
  hour integer,
  release_time integer,
  number_of_levels integer,
  pressure_source_code char(50),
  non_pressure_source_code char(50),
  latitude integer,
  longitude integer,
  major_level_type integer,
  minor_level_type integer,
  elapsed_time_since_launch integer,
  pressure integer,
  pressure_flag  char(50),
  geo_potential_height integer,
  geo_potential_height_flag  char(50),
  temperature integer,
  temperature_processing_flag   char(50),
  relative_humidity  integer,
  dew_point_depression integer,
  wind_direction integer,
  wind_speed integer
)

CREATE INDEX weather_balloon_idx ON weather_balloon_data (sounding_date);

How to Run

Run Tests

test (from sbt-shell) or sbt "test" (from terminal)

Build the project

clean;assembly (from sbt-shell) or sbt "clean;assembly" (from terminal)

Run the application

Load Weather data:

java -cp target\scala-2.12\adjust-data-challenge.jar com.adjust.data.WeatherDataLoader --inputPath "C:\Users\User\Downloads\USM0
0070219-data"

Process Weather data:

java -cp target\scala-2.12\adjust-data-challenge.jar com.adjust.data.WeatherDataProcessor --outputPath "C:\Users\User\Downloads\USM0
0070219-data-output"

sbt "runMain com.adjust.data.WeatherDataLoader --inputPath C:\Users\User\Downloads\USM00070219-data"

prasanna-ds / adjust-data-challenge Goto Github PK

adjust-data-challenge's Introduction

Overview

Data Model

How to Run

adjust-data-challenge's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent