Giter Club home page Giter Club logo

duplicate-code-detection-tool's Introduction

Hi there, I'm Dimitris ๐Ÿ‘‹

I call myself a Software Engineer and a Maker. During the days I like to hack on embedded systems of all shapes and sizes. Having worked on systems running Yocto Linux and the Android Open Source Project (AOSP) I am very passionate about software architecture and clean code being the enablers for high quality, reusability, and maintainability. Consequently, I incorporate this mindset and principles into my way of working as I believe they are fundamental for agility and effectiveness.

Usually, I spend my evenings on open source projects which I blog about. My favorite ones revolve around IoT, robotics and handheld gadgets where I love to develop the entire product stack. Hardware-wise, this includes everything from the PCB to the physical case around the electronics and software, from the embedded firmware to the cloud. Lately, I have also gotten into creating YouTube tutorials about software craftsmanship with C++ and embedded systems.

Additionally, I teach the undergraduate DIT112 course on Systems Development at the University of Gothenburg. In DIT112, students learn how to create a system composed of both software and hardware components in regards to development, requirements handling, testing and quality assurance; all that, in an agile manner. Check out some of their projects: [2019], [2020], [2021], [2022].

During the autumn term of 2021, I also taught the postgraduate DAT265 course on Software Evolution at Chalmers (Gothenburg). The students were taught how to evolve an existing code base by using various techniques to comprehend and refactor it while being in contact with the upstream project's stakeholders.

platisd stats overview platisd languages

duplicate-code-detection-tool's People

Contributors

cael35 avatar dzibukalexander avatar itaykraise-vayyar avatar iwishiwasaneagle avatar jmartens avatar platisd avatar tacevee avatar whisperstream avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

duplicate-code-detection-tool's Issues

Github Action fails to post when too many characters

I am running on a legacy repo and it has lots of duplication. So much in fact that the output file is larger than Github allows to be posted to a comment.

Posting results to GitHub failed with code: 422
{"message":"Validation Failed","errors":[{"resource":"IssueComment","code":"unprocessable","field":"data","message":"Body is too long (maximum is 65536 characters)"}],"documentation_url":"https://docs.github.com/rest/reference/issues#create-an-issue-comment"}

Can the output be clamped in size to avoid this error?

My temporary workaround is to increase the ignore_below value.

feature request: Ignore comments

I'm getting high similarity scores, even though when I analyze the files with grep -Fxf FILE1 FILE2 it's just comments. It would be nice to ignore comments.

clone doesn't work for private repositories

I get this error when trying to run the action on a private repo (when I set the repo to public it works just fine)
fatal: could not read Password for 'https://***@github.com': No such device or address

I've looked around and it seems like it's the same issue as this one

Here is a log of my run:

Run platisd/[email protected]
  with:
    github_token: ***
    project_root_dir: projects
    directories: .
    fail_above: 70
    ignore_below: 0
    file_extensions: h, hpp, c, cpp, cc, java, py, cs
    warn_above: 100
/usr/bin/docker run --name a95ec8fa99b7c7894e8abe71ec[2](https://github.com/itaykraise-vayyar/testing-git-hub-actions/runs/5480820825?check_suite_focus=true#step:4:2)eb60de7cf_d2f098 --label 29a95e --workdir /github/workspace --rm -e INPUT_GITHUB_TOKEN -e INPUT_PROJECT_ROOT_DIR -e INPUT_DIRECTORIES -e INPUT_FAIL_ABOVE -e INPUT_IGNORE_BELOW -e INPUT_IGNORE_DIRECTORIES -e INPUT_FILE_EXTENSIONS -e INPUT_WARN_ABOVE -e HOME -e GITHUB_JOB -e GITHUB_REF -e GITHUB_SHA -e GITHUB_REPOSITORY -e GITHUB_REPOSITORY_OWNER -e GITHUB_RUN_ID -e GITHUB_RUN_NUMBER -e GITHUB_RETENTION_DAYS -e GITHUB_RUN_ATTEMPT -e GITHUB_ACTOR -e GITHUB_WORKFLOW -e GITHUB_HEAD_REF -e GITHUB_BASE_REF -e GITHUB_EVENT_NAME -e GITHUB_SERVER_URL -e GITHUB_API_URL -e GITHUB_GRAPHQL_URL -e GITHUB_REF_NAME -e GITHUB_REF_PROTECTED -e GITHUB_REF_TYPE -e GITHUB_WORKSPACE -e GITHUB_ACTION -e GITHUB_EVENT_PATH -e GITHUB_ACTION_REPOSITORY -e GITHUB_ACTION_REF -e GITHUB_PATH -e GITHUB_ENV -e GITHUB_STEP_SUMMARY -e RUNNER_OS -e RUNNER_ARCH -e RUNNER_NAME -e RUNNER_TOOL_CACHE -e RUNNER_TEMP -e RUNNER_WORKSPACE -e ACTIONS_RUNTIME_URL -e ACTIONS_RUNTIME_TOKEN -e ACTIONS_CACHE_URL -e GITHUB_ACTIONS=true -e CI=true -v "/var/run/docker.sock":"/var/run/docker.sock" -v "/home/runner/work/_temp/_github_home":"/github/home" -v "/home/runner/work/_temp/_github_workflow":"/github/workflow" -v "/home/runner/work/_temp/_runner_file_commands":"/github/file_commands" -v "/home/runner/work/testing-git-hub-actions/testing-git-hub-actions":"/github/workspace" 29a95e:c8fa99b7c789[4](https://github.com/itaykraise-vayyar/testing-git-hub-actions/runs/5480820825?check_suite_focus=true#step:4:4)e8abe71ec2eb[6](https://github.com/itaykraise-vayyar/testing-git-hub-actions/runs/5480820825?check_suite_focus=true#step:4:6)0de[7](https://github.com/itaykraise-vayyar/testing-git-hub-actions/runs/5480820825?check_suite_focus=true#step:4:7)cf
Cloning into 'itaykraise-vayyar/testing-git-hub-actions'...
fatal: could not read Password for 'https://***@github.com': No such device or address

What do you think?

be suitable for code similarity check?

from the source code, I see gensim be used here. gensim be used as text similarity check is cool, but whether be suitable to code similarity check? I mean, for example I want to compare two Unity projects, all scripts must follow C# syntax, many C# fixed words, and to use Unity Engine, all same framework sentences be used such as 'using UnityEngine'. Can gensim ignore these? no mis-check?

Request to provide output markdown file

@platisd

Is it possible for your to provide output markdown file which i can use for publishing as checkrun using another Github Action. Or alternatevely if you can publish the same markdown table as checkrun thats is also good enough for me.

Reuse of the previous comment

It better to reuse (overwrite) existing comment (which starts with Duplicate code detection tool report) to avoid being overloaded by bot comments.

Reports files without duplicated code

Hi, I've been using this GH Action and set ignore_below: 20, tool anyway shows up, even when there's no similarities found whatsoever

Example:
Screenshot 2022-07-29 at 14 56 51

I might be doing something wrong, but I've literally copied over from the example provided

Thanks

Output as a CSV?

Thinking about being able to compare SQL used by data analysts and present similarity results in a familiar form e.g. a table.

No module named 'gensim'

image

python duplicate_code_detection.py -d smartcar_shield/src
Traceback (most recent call last):
File "E:\nbu\similar\code\duplicate-code-detection-tool\duplicate_code_detection.py", line 10, in
import gensim
ModuleNotFoundError: No module named 'gensim'

--ignore-directories option does work on gitbash (Windows)

Hello,
I'm using:
> duplicate_code_detection.py --ignore-directories general/util/async/file/test -d general
in gitbash terminal on Windows, but the files in general/util/async/file/test are not ignored.

Value example of involved variables:

ignore_directories = ['general/util/async/file/test']
files_to_ignore = ['general/util/async/file/test\\util_async_file_test\\File_test.cpp', ...]
source_code_files = [..., 'general\\util\\async\\file\\test\\util_async_file_test\\File_test.cpp',...]

I fixed it, do you want me to submit a PR ? :-)

Feature request: Option to filter some files from input directory

It would be nice to have an option to do not run the comparison on all of the files within the directory, but just on a filtered subset of them.
Idea to implement: An input argument that could be a regular expression that would be executed on the list of files.

If you don't mind, I would volunteer to implement it too.

Feature proposal: add an option to add LoC in outputs

Hello,
It would be nice to help the results analysis to add an option --with-loc that add the lines of code count for each file.
The ouputs will be:

  • text
Code duplication probability for general\bar\EventLoop.cpp,351
--------------------------------------------------------------
                   File,#LoC                     Similarity (%)
--------------------------------------------------------------
general\util\async\EventLoop.cpp,351                 100.00
general\foo\EventLoop.cpp,351                        100.00
  • csv
File A,#LoC A,File B,#LoC B,Similarity
general\bar\EventLoop.cpp,351,general\foo\EventLoop.cpp,351,100.0
general\bar\EventLoop.cpp,351,general\util\async\EventLoop.cpp,351,100.0
  • json
{
    "general\\bar\\EventLoop.cpp": {
        "#LoC": 351,
        "general\\foo\\EventLoop.cpp": {
            "#LoC": 351,
            "similarity": 100.0
        },
        "general\\util\\async\\EventLoop.cpp": {
            "#LoC": 351,
            "similarity": 100.0
        }
    }
}

How to use it

I dont know how to use it, I have django project run by docker on my local development, and I need to run this tool within it, How ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.