Learning Process

This is README documents my learning process for Hadoop.
I initially started learning about Big Data in general, but honed in on Hadoop after some research

Decided to learn NoSQL and Big Data 🥳
Watched video comparing relational databases (which I know) to NoSQL databases (which I don't know)
Started a course on Big Data + Hadoop
Started a MongoDB crash course, as I've already used MongoDB for work, and for personal web projects
Did more research and decided to learn Hadoop instead
Learned about evolution of technology that lead to Big Data
Learned definition of Big Data, datasets so large and complex they can't be processed using traditional tools
Learned about the 5 Vs of Big Data; Volume, Velocity, Variety, Value, Veracity
Decided to start a more practical course, as my current Hadoop course is very theoretical and lecture-based
Started downloading HDP Sandbox for VirtualBox (6 hour download time 🙄)
Sandbox won't import from downloaded file

☝ Error "Failed to import appliance C:/Users/nikun/Downloads/HDP_2.5_virtualbox.ova.

Result Code: E_INVALIDARG (0x80070057)"

Searched the internet extensively for a solution
Finally found a workaround by extracting the VMDK file and running it seperately
Workaround didn't work, turns out virtual box file is corrupted, will download it again overnight
Downloaded the HDP Sandbox again
Got further along than before
Ran into a memory error
Stack Overflow tells me I don't have enough RAM

☝ Need 8GB free memory, I only have 8GB total memory
Finally got Hadoop running after upgrading my RAM
Navigated to localhost port provided by Ambari but I'm seeing loads of errors
Errors fixed themselves (the processes must have been initialising)
Uploaded 2 movie record databses using the Hive interface
Used Hive's SQL to query for the most popular movie up to 1998

☝ The winner was Star Wars (1977) 🎉
Decided to brush up on my SQL and started a seperate SQL course
Completed SQL course 🎉
Returned back to Hive, where I'm now extremely comfortable using its SQL-like syntax
Used the HDFS web interface to to upload and view data files through its explorer
Opened a command line SSH connection to HDP Sandbox using Putty
Created and deleted data files using command line
Learned about MapReduce on a conceptual level; mapper, shuffle & sort, reducer
Also learned how MapReduce works across a cluster
Self-discovered VirtualBox Snapshots, to skip the long loading time for th Ambari dashboard to load all processes
Pausing Hadoop to Learn Kafka! https://github.com/johnobla/kafka

johnobla / hadoop Goto Github PK

hadoop's Introduction

Learning Process

hadoop's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent