This script has been tested on Ubuntu 18.04.
Please install docker before you run this script.
-
Run Hortonworks Data Platform Docker build script
$ cd HDP_3.0.1_docker-deploy-scripts $ bash docker-deploy-hdp30.sh
-
Start the HDP docker.
$ docker start sandbox-hdp $ docker start sandbox-proxy
-
Restart all of the Hadoop service by accessing the ambari dashboard
http://localhost:8080 Username: raj_ops Password: raj_ops
-
Copy these files to the HDP docker /root folder.
Ingestion/copy_processed_files.sh Ingestion/ingestion.sh Ingestion/schema.sql ETL_Spark/build/AllAgeJob.jar ETL_Spark/build/FemaleJob.jar
Command
$ scp -P 2222 Ingestion/* root@localhost:~/ $ scp -P 2222 ETL_Spark/build/* root@localhost:~/
- You will be asked to change the default password (default password: hadoop).
-
Run the following commands sequentially.
# Enter sandbox-hdp docker $ ssh -p 2222 root@localhost $ bash ingestion.sh $ hive -f schema.sql $ spark-submit --class com.msd.AllAgeApp --master yarn --deploy-mode client AllAgeJob.jar $ spark-submit --class com.msd.FemaleApp --master yarn --deploy-mode client FemaleJob.jar $ bash copy_processed_files.sh
-
You can verify the result by query the hive tables (from inside the sandbox docker)
$ hive
First Output
SELECT * FROM msd.yearly_avg_all_age;
Second Output
SELECT * FROM msd.yearly_avg_female;