Giter Club home page Giter Club logo

plagiarism-detector's Introduction

Plagiarism Detector in C++

Uses naive methods to detect observable plagiarism in text files.

Tests implemented

1. Token frequency matching

Target file's word count is matched with each individual textfile's token counts in the database folder to find a high degree of similarity in the tokens and their frequency of use.

2. N-Gram matching

A direct match of consecutive tokens (or ngrams) was performed to detect similarity in patterns and neighbourhood of tokens. The value of N for the N-Gram generation was varied and a cumulative result was obtained by a weighted average over all of the results.

3. Cosine matching

Cosine of the angle between the vectors obtained from the target and the base text files is computed to estimate the simiilarity in the token vectors of both the files.

Getting started

1. Place all the reference text files in the database directory.

2. Place all the text files required to be checked in the target directory.

3. (Optional) Edit the stopwords.txt text file as per requirement, to add words which are to be ignored in the analysis.

4. Change the database and target_folder variables with the actual location of them.

5. Compile run.cpp in C++11 (or above), with the command g++ run.cpp -std=c++11.

6. Run the generated executable with the command ./a.out (in Linux environment).

Image of the project

plagiarism-detector's People

Contributors

ahmedraja1 avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.