Giter Club home page Giter Club logo

hadoop's Introduction

Learning Process

This is README documents my learning process for Hadoop.
I initially started learning about Big Data in general, but honed in on Hadoop after some research

  1. Decided to learn NoSQL and Big Data ๐Ÿฅณ

  2. Watched video comparing relational databases (which I know) to NoSQL databases (which I don't know)

  3. Started a course on Big Data + Hadoop

  4. Started a MongoDB crash course, as I've already used MongoDB for work, and for personal web projects

  5. Did more research and decided to learn Hadoop instead

  6. Learned about evolution of technology that lead to Big Data

  7. Learned definition of Big Data, datasets so large and complex they can't be processed using traditional tools

  8. Learned about the 5 Vs of Big Data; Volume, Velocity, Variety, Value, Veracity

  9. Decided to start a more practical course, as my current Hadoop course is very theoretical and lecture-based

  10. Started downloading HDP Sandbox for VirtualBox (6 hour download time ๐Ÿ™„)

  11. Sandbox won't import from downloaded file

    โ˜ Error "Failed to import appliance C:/Users/nikun/Downloads/HDP_2.5_virtualbox.ova.

Result Code: E_INVALIDARG (0x80070057)"

  1. Searched the internet extensively for a solution

  2. Finally found a workaround by extracting the VMDK file and running it seperately

  3. Workaround didn't work, turns out virtual box file is corrupted, will download it again overnight

  4. Downloaded the HDP Sandbox again

  5. Got further along than before

  6. Ran into a memory error

  7. Stack Overflow tells me I don't have enough RAM

    โ˜ Need 8GB free memory, I only have 8GB total memory

  8. Finally got Hadoop running after upgrading my RAM

  9. Navigated to localhost port provided by Ambari but I'm seeing loads of errors

  10. Errors fixed themselves (the processes must have been initialising)

  11. Uploaded 2 movie record databses using the Hive interface

  12. Used Hive's SQL to query for the most popular movie up to 1998

    โ˜ The winner was Star Wars (1977) ๐ŸŽ‰

  13. Decided to brush up on my SQL and started a seperate SQL course

  14. Completed SQL course ๐ŸŽ‰

  15. Returned back to Hive, where I'm now extremely comfortable using its SQL-like syntax

  16. Used the HDFS web interface to to upload and view data files through its explorer

  17. Opened a command line SSH connection to HDP Sandbox using Putty

  18. Created and deleted data files using command line

  19. Learned about MapReduce on a conceptual level; mapper, shuffle & sort, reducer

  20. Also learned how MapReduce works across a cluster

  21. Self-discovered VirtualBox Snapshots, to skip the long loading time for th Ambari dashboard to load all processes

  22. Pausing Hadoop to Learn Kafka! https://github.com/johnobla/kafka

hadoop's People

Contributors

johnobla avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.