Giter Club home page Giter Club logo

plagiarism_detection's Introduction

Plagiarism Detection

  • plagiarism detector following this paper that examines a text file and performs binary classification; labeling that file as either plagiarized or not, depending on how similar the text file is to a provided source text.


Problem Statment and Analysis

  • Plagiarism is defined as “the appropriation of another person's ideas, processes, results, or words without giving appropriate credit”, so our goal here to try to find a solution for this by using some comparing between original and target text after making some preprocessing techniques for text before fitting it into Machine learning model to classify this model is plagiarized or not, according to the paper mentioned above will try to make some text processing after calculating containment and longest common subsequence using dynamic programming algorithm.

    Created features.

  • before we prepare our final dataset I'm made multiple features using multiple N-gram with containment and longest common subsequence, then try to calculate a correlation matrix to ignore very high correlated columns

Correlation Matrix.

DataSet

  • This data is a slightly modified version of a dataset created by Paul Clough (Information Studies) and Mark Stevenson (Computer Science), at the University of Sheffield. You can read all about the data collection and corpus, at their university webpage

Citation for data: Clough, P. and Stevenson, M. Developing A Corpus of Plagiarised Short Answers, Language Resources and Evaluation: Special Issue on Plagiarism and Authorship Analysis,

Project Flow

  • Data Exploration

  • Defining Features

  • Train and Deploy Model into AWS SageMaker

plagiarism_detection's People

Contributors

mostafa-ashraf19 avatar

Watchers

James Cloos avatar  avatar

Forkers

e-kab18

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.