Giter Club home page Giter Club logo

map-reduce-basics's Introduction

#4 MapReduce Basics

1. What commands did you use to run each script?

python3.5 FirstJob.py -r hadoop hdfs:///Data/ > Output_first.csv

python3.5 SecondJob.py -r hadoop hdfs:///Data/ > Output_second.csv

python3.5 ThirdJob.py -r hadoop hdfs:///Data/ > Output_third.csv

python3.5 FourthJob.py -r hadoop hdfs:///Data/ > Output_fourth.csv

2. What technical errors did you experience?

I kept getting the Retrying connect to server error. I force stopped the execution and retried for 1 or 2 more times, and it worked fine. I tried restarting the docker multiple times, but have not found a solution yet.

3. What conceptual difficulties did you experience?

I was not familiar with jar or class files, so using them like functions was new to me.

4. How much time did you spend on each part of the assignment? Track your time according to the following items: Gitlab & Git, Docker setup/usage, actual reflection work, etc.

Gitlab & Git: 5 minutes

Docker setup/usage: 1 hour

Actual reflection work: 4~5 days

5. What was the hardest part of this assignment?

The hardest part was waiting the execution to be over. It was so slow to check if my function was working correctly with hadoop, which took most of my time on this project.

6. What was the easiest part of this assignment?

Creating Word.txt file

7. What did you actually learn from doing this assignment?

I learned how to make Mapper() and Reducer() functions run on the hadoop system, that uses STDIN and STDOUT to read the data. Also, by using the open source framework MrJob that wraps the Hadoop streaming job, it made process a lot simpler to do MapReduce jobs on Hadoop.

8. Why does what I learned matter both academically and practically?

MapReduce job is a standard procedure that has been used in a big companies for years since it came out. MrJob opensource was created to make tasks simpler. This means there is a huge market that uses MapReduce in small to giant companies. This is important as a Data Scientist to realize how big data is managed in the server, and be able to handle them.

map-reduce-basics's People

Contributors

ryandhjeon avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.