Giter Club home page Giter Club logo

plagiarism-detection's Introduction

Plagarism Detector

The Plagiarism Detector project provides a comprehensive solution for detecting plagiarism and finding similarities between text documents. Leveraging the power of natural language processing (NLP), web scraping techniques, and data visualization tools, the project offers users a user-friendly interface to input text or upload files for analysis. The system employs algorithms such as tokenization, cosine similarity calculation, and web scraping to extract relevant information and compare text content. Through an intuitive web application built using Streamlit, users can easily identify potential instances of plagiarism or similarities between documents. The project also includes interactive visualizations, powered by Plotly and Plotly Express, to present the results in a clear and informative manner. Overall, the Plagiarism Detector project serves as a valuable tool for educators, researchers, and content creators to ensure the integrity and originality of written work.

Features

  • Light/dark mode toggle
  • Fullscreen mode
  • Flexible Input Options
  • Comprehensive Analysis
  • Web Scraping Capabilities
  • Advanced Similarity Measurement
  • Interactive Visualizations
  • User-Friendly Interface

Tech Stack

Client: Streamlit, HTML/CSS.

Server: Python

Libraries & Algorithms:

Libraries:

  • Pandas
  • NLTK (Natural Language Toolkit)
  • Beautiful Soup
  • CountVectorizer and cosine_similarity from scikit-learn
  • docx2txt
  • PyPDF2
  • plotly.express (px)

Algorithms:

  • Tokenization
  • Cosine Similarity
  • Web Scraping
  • Document Retrieval
  • Data Visualization

Usage/Examples

def get_similarity(text1, text2):
    text_list = [text1, text2]
    cv = CountVectorizer()
    count_matrix = cv.fit_transform(text_list)
    similarity = cosine_similarity(count_matrix)[0][1]
    return similarity

def get_similarity_list(texts, filenames=None):
    similarity_list = []
    if filenames is None:
        filenames = [f"File {i+1}" for i in range(len(texts))]
    for i in range(len(texts)):
        for j in range(i+1, len(texts)):
            similarity = get_similarity(texts[i], texts[j])
            similarity_list.append((filenames[i], filenames[j], similarity))
    return similarity_list

Run Locally

Clone the project

  git clone https://github.com/Karthik-02/plagiarism-detection.git

Go to the project directory

  cd plagiarism-detection

Install dependencies

  pip install -r requirements.txt

Start the server

  streamlit run app.py

Screenshots

image

image

image

image

image

image

Authors

Badges

MIT License

plagiarism-detection's People

Contributors

karthik-02 avatar

Stargazers

 avatar

Watchers

 avatar

Forkers

rohail642

plagiarism-detection's Issues

missing file

app.py file importing docx2txt but you are not upload docx2txt

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.