Giter Club home page Giter Club logo

bigdata_project's Introduction

Big Data Project

Requirements

This script has been tested on Ubuntu 18.04.

Please install docker before you run this script.

Instructions

  1. Run Hortonworks Data Platform Docker build script

    $ cd HDP_3.0.1_docker-deploy-scripts
    $ bash docker-deploy-hdp30.sh
  2. Start the HDP docker.

    $ docker start sandbox-hdp
    $ docker start sandbox-proxy
  3. Restart all of the Hadoop service by accessing the ambari dashboard

    http://localhost:8080
    Username: raj_ops
    Password: raj_ops
  4. Copy these files to the HDP docker /root folder.

    Ingestion/copy_processed_files.sh
    Ingestion/ingestion.sh
    Ingestion/schema.sql
    ETL_Spark/build/AllAgeJob.jar
    ETL_Spark/build/FemaleJob.jar

    Command

    $ scp -P 2222 Ingestion/* root@localhost:~/
    $ scp -P 2222 ETL_Spark/build/* root@localhost:~/
    • You will be asked to change the default password (default password: hadoop).
  5. Run the following commands sequentially.

    # Enter sandbox-hdp docker
    $ ssh -p 2222 root@localhost
    $ bash ingestion.sh
    $ hive -f schema.sql
    $ spark-submit --class com.msd.AllAgeApp --master yarn --deploy-mode client AllAgeJob.jar
    $ spark-submit --class com.msd.FemaleApp --master yarn --deploy-mode client FemaleJob.jar
    $ bash copy_processed_files.sh
  6. You can verify the result by query the hive tables (from inside the sandbox docker)

    $ hive

    First Output

    SELECT * FROM msd.yearly_avg_all_age;

    Second Output

    SELECT * FROM msd.yearly_avg_female;

bigdata_project's People

Contributors

nakamura41 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.