Giter Club home page Giter Club logo

inverted-index's Introduction

Big Data Project - Inverted-Index

What is Inverted Indexing?
We can see it as a data structure that is used by most search engines to quickly find content or web pages which contain specific words or phrases. Here, a word in a document is indexed, along with a reference to the location of the word in the document. This is what makes it a faster search as it does not need to scan through each document entirely.

Hadoop Version used: Hadoop-3.2.1 Java Version used: 1.8.0_362

Steps to follow:

  1. Made two sample files in a folder named as input in Downloads:
    sample1.txt
    5722018411 Hello World Bye World

    sample2.txt
    6722018415 Hello Hadoop Goodbye Hadoop

  2. Make an input directory “input” command: hdfs dfs -mkdir /input

  3. Put the sample files from Downloads/input to input directory in hadoop hdfs dfs –put /home/keshavgarg/Downloads/input/sample1.txt /input/ hdfs dfs –put /home/keshavgarg/Downloads/input/sample2.txt /input/

    Come to hadoop-3.2.1 folder using command 'Cd ..'

  4. Then type the command: bin/hadoop jar share/hadoop/mapreduce/invertedindex.jar InvertedIndex /input /output

  5. Type hdfs dfs -cat /input/part-r-00000

How to convert java file to jar file:

  1. Add these line to the hadoop-env.sh or set them in terminal export PATH=${JAVA_HOME}/bin:${PATH} export HADOOP_CLASSPATH=${JAVA_HOME}/lib/tools.jar

  2. Copy the InvertedIndex.java to Hadoop Distributed root folder

  3. Then run these commands: bin/hadoop com.sun.tools.javac.Main InvertedIndex.java jar cf invertedindex.jar InvertedIndex*.class

  4. It will create a jar file along with three class files, copy all of them in hadoop-3.2.1/share/hadoop/mapreduce

inverted-index's People

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.