Giter Club home page Giter Club logo

compare50's Introduction

compare50

compare50 is currently under active development.

compare50's People

Contributors

cmlsharp avatar dmalan avatar jelleas avatar jsarchibald avatar rongxin-liu avatar tlively avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

compare50's Issues

Compare50 raises an AttributeError on python3.8 (+ mac?)

Process SpawnProcess-4:
Traceback (most recent call last):
  File "/usr/local/var/pyenv/versions/3.8.0/lib/python3.8/multiprocessing/process.py", line 313, in _bootstrap
    self.run()
  File "/usr/local/var/pyenv/versions/3.8.0/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/var/pyenv/versions/3.8.0/lib/python3.8/concurrent/futures/process.py", line 233, in _process_worker
    call_item = call_queue.get(block=True)
  File "/usr/local/var/pyenv/versions/3.8.0/lib/python3.8/multiprocessing/queues.py", line 116, in get
    return _ForkingPickler.loads(res)
AttributeError: Can't get attribute 'Preprocessor' on <module '__main__' (built-in)>

Looks like something changed with Pickle? in 3.8.

Quick workaround for now, run compare50 with --debug.

Null Interval objects not allowed in IntervalTree

It appears that markdown code blocks mess up compare50's span ranges because the starting index of some tokens is reset when a new code block ( ```LANGUAGE) is encountered (presumably because the tokenizer wants to think of separate code blocks as separate source files). For example:

a/foo.md:

    ```lua
    local push = require "push"

    local gameWidth, gameHeight = 1080, 720 --fixed game resolution
    local windowWidth, windowHeight = love.window.getDesktopDimensions()

    push:setupScreen(gameWidth, gameHeight, windowWidth, windowHeight, {fullscreen = true})

    function love.draw()
      push:start()
  
      --draw here
  
      push:finish()
    end
    ```
    ```lua
    local push = require "push"

    local gameWidth, gameHeight = 1080, 720 --fixed game resolution
    local windowWidth, windowHeight = love.window.getDesktopDimensions()
    windowWidth, windowHeight = windowWidth*.7, windowHeight*.7 --make the window a bit 
    smaller than the screen itself

    push:setupScreen(gameWidth, gameHeight, windowWidth, windowHeight, {fullscreen = false})

    function love.draw()
      push:start()
  
      --draw here
  
      push:finish()
    end
    ```

b/foo.md:

    ```lua
    local push = require "push"

    local gameWidth, gameHeight = 1080, 720 --fixed game resolution
    local windowWidth, windowHeight = love.window.getDesktopDimensions()

    push:setupScreen(gameWidth, gameHeight, windowWidth, windowHeight, {fullscreen = true})

    function love.draw()
      push:start()
      
      --draw here
      
      push:finish()
    end
    local push = require "push"

    local gameWidth, gameHeight = 1080, 720 --fixed game resolution
    local windowWidth, windowHeight = love.window.getDesktopDimensions()
    windowWidth, windowHeight = windowWidth*.7, windowHeight*.7 --make the window a bit smaller than the screen itself

    push:setupScreen(gameWidth, gameHeight, windowWidth, windowHeight, {fullscreen = false})

    function love.draw()
      push:start()
      
      --draw here
      
      push:finish()
    end
    ```

Running compare50 on these produces the following:

$ compare50 --passes structure --verbose a/foo.md b/foo.md
...
Sorry, something's wrong! Let [email protected] know!
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/local/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.7/site-packages/compare50/__main__.py", line 351, in <module>
    main()
  File "/usr/local/lib/python3.7/site-packages/compare50/__main__.py", line 340, in main
    pass_to_results[pass_] = _api.compare(scores, ignored_files, pass_)
  File "/usr/local/lib/python3.7/site-packages/compare50/_api.py", line 76, in compare
    for comparison in pass_.comparator.compare(scores, ignored_files):
  File "/usr/local/lib/python3.7/site-packages/compare50/comparators/_winnowing.py", line 133, in compare
    span_matches += _api.expand(index_a.compare(index_b), tokens_a, tokens_b)
  File "/usr/local/lib/python3.7/site-packages/compare50/_api.py", line 231, in expand
    span_tree_a.addi(new_span_a.start, new_span_a.end)
  File "/usr/local/lib/python3.7/site-packages/intervaltree/intervaltree.py", line 330, in addi
    return self.add(Interval(begin, end, data))
  File "/usr/local/lib/python3.7/site-packages/intervaltree/intervaltree.py", line 313, in add
    " {0}".format(interval)
ValueError: IntervalTree: Null Interval objects not allowed in IntervalTree: Interval(335, 173)

Some users match too many times

Some submissions match against multiple archives or current students and actually fill up a large amount of the 50 slots. Should we set some maximum that a user can match, maybe like 5 or so and have the be overwriteable with a command line flag?

@dmalan @crossroads1112 @Jelleas

Be able to display matches with distro differently

This requires adding distro matches to the output of compare, which requires explicit matching against the distro indices. Should this explicit match happen only for the raw text? Should this be configurable?

Scrolling when clicking "next" is out of sync

When clicking "Next" in match_#.html, the left side scrolls to the next match more quickly than the right side scrolls. Not sure if that's intentional or not, but for scanning matches quickly it would be more efficient if both matches scrolled at the same time.

Write initial frontend template

The frontend should have a view for sorting the result pairs based on scores for different passes and a view that shows a pair of submissions side by side with shared fragments highlighted. Clicking a highlighted fragment should show a hyperlinked list of matching fragments in both submissions. There should be a toggle for turning on and off highlighting of fragments from different passes.

The JSON data will have the following schema:

{
    "files": [file paths...],
    "groups": [[file indices for submission group], ...],
    "results": [
        {
            "subs": [fist submission group index, second submission group index],
            "passes": {
                pass name: [
                    score,
                    [
                        [[[file index A, start, stop] for fragment matching hash in first submission],
                        [[file index B, start, stop] for fragment matching hash in second submission]]
                        for each hash shared among the submissions
                    ]
                ]
                ...
            }
        }
        ...
    ]
}

Add some sanity checks/warnings

e.g. if they ask us to compare a folder with many subfolders they may have intended to add /* or similar.

Similarly we may want to blacklist/warn about certain file extensions. Like, "do you really want to include this pdf in the submission?"

show uniqueness of each match

For each matching area, show how many files match ("2 files" means just the current two files), similar to etector. Etector gives details in tooltip, and give larger font size to more unique things. Former is very helpful, but latter can make comparing files difficult.

Knowing how unique a match is is very helpful for determining which cases to refer and articulating to committee how improbable the similarity is.

add support for modeline

  • Recognize below, where ... is any freeform string (for now).
    • # cs50: ...
    • // cs50: ...
    • /* cs50: ...
  • Can appear in first line or, hashbang, second.
  • Ignore when comparing files.
  • Remove from output, display value thereof elsewhere in UI.
  • If present on first line and blank line below it, remove blank line too.

zsh: argument list too long: compare50

compare50 cash/submissions/* -a cash/archives/2012/fall/* cash/archives/2013/fall/* cash/archives/2014/fall/* cash/archives/2015/fall/* cash/archives/2015/spring/* cash/archives/2016/fall/* cash/archives/2017/fall/* cash/archives/2018/fall/* cash/archives/2018/spring/* cash/archives/honeypots/03052019/*

Can I just do cash/archives/* even though that does not constitute one submission?

Clusters highlight together even when no longer a cluster

If a set of submissions forms a large cluster, and the threshold increases such that the cluster breaks up into smaller clusters, hovering over one of the submissions still highlights all of the submissions in the original cluster rather than the new smaller one. This is usually not ideal because the original cluster is often very large and therefore not too meaningful, since the threshold starts low.

Screen Shot 2020-07-13 at 11 29 47 AM

compare50 - Killed

Screen Shot 2019-04-04 at 2 31 45 PM

This keeps happening for the homepage assignment. @crossroads1112 @Jelleas is there a workaround to stop it from timing out? I assume this will probably take the longest to run since some students have a lot of .html files and compare50 is trying to compare each student's html files against every other .html file in the archives and submissions folder.

@dmalan

Show common code

Request from the board here: also show common approaches to a problem, what do most students do. This to help answer the question, why is it telling that an approach differs in an expected plagiarism case.

Have compare50 read from stdin if no submissions given

Problem: We store submissions like so:

<student>/<problem>__<timestamp>

For this we have a small script that selects just the last submitted problem for comparison. Would be nice if we could pipe the result from said script to compare50.

english_dictionary.txt has 600 permissions instead of 644

$ cli50
$ sudo pip install compare50
$ ls -l /usr/local/lib/python3.7/site-packages/compare50/comparators/
total 1436
-rw-r--r-- 1 root root      74 Aug  7 13:06 __init__.py
drwxr-xr-x 2 root root    4096 Aug  7 13:06 __pycache__/
-rw-r--r-- 1 root root    4078 Aug  7 13:06 _misspellings.py
-rw-r--r-- 1 root root   13639 Aug  7 13:06 _winnowing.py
-rw------- 1 root root 1439228 Aug  7 13:06 english_dictionary.txt

move sidebar menu to top

right now the "structure", "exact", and "misspellings" tabs are on the right side and take up a lot of horizontal real estate. Possible to move this along the top somehow?

@dmalan @dlloyd09

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.