Giter Club home page Giter Club logo

aga's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

aga's Issues

Handle mutability

There are a family of issues related to mutability:

  • We should have some ability to ensure that submitted code does or does not mutate its inputs, and inspect the mutation if it does mutate them.
  • We should deepcopy inputs when passing them to student submissions, because those submissions may mutate the arguments, causing the golden solution to produce incorrect output.

Support hypothesis

We could use "the solution outputs the same value as the reference solution" as the property to be tested against.

Fix coverage

We should fix the coverage data we collect to take into account the
configuration in pyproject.toml. It looks like the problem has to do with
nox, becuse this information is taken into account when running pytest --cov
by hand.

Handle exceptions in CLI

Right now, the CLI does not properly handle a large number of errors, instead exposing the Python tracebacks. We should catch these errors and report reasonable error messages.

Multi-problem submissions

Currently each generated autograder zip only grades a single problem. It would
be convenient to generate zips on e.x. a per-file basis which grade multiple
functions placed in a single python file.

Determine total score

We can get the total points for an assignment from the assignment.total_points object in the submission_metadata.json file. We should use this to determine how many points the autograder gives out.

e2e testing

We should have e2e test cases which:

  1. use the CLI to generate a zip
  2. use gradescope's docker image to upload the zip to docker
  3. simulate the work done by gradescope to run the autograder
  4. inspect the resulting JSON

Here's a script which does some of this work that I've been using by hand:

#!/bin/bash

poetry build
zipfile=$(aga gen "$1")

docker run --rm -it                                             \
  -v "$PWD"/"$zipfile":/tmp/autograder.zip                      \
  -v "$PWD"/dist/aga-0.2.0.tar.gz:/autograder/aga/aga.tar.gz    \
  -v "$PWD"/"$2":/autograder/submission/"$1".py                 \
  gradescope/auto-builds bash -c "
bash -c '
set -e

apt-get update

apt-get install -y curl unzip dos2unix 

mkdir -p /autograder/source /autograder/results 

unzip -n -d /autograder/source /tmp/autograder.zip 

cp /autograder/source/run_autograder /autograder/run_autograder 

dos2unix /autograder/run_autograder /autograder/source/setup.sh 

chmod +x /autograder/run_autograder 
apt-get update 

bash /autograder/source/setup.sh 
apt-get clean 
rm -rf /var/lib/apt/lists/* /var/tmp/* 

/autograder/run_autograder 
cat /autograder/results/results.json
'
bash
"

Pickling a problem removes its context

Found by Jim when testing. This prevents it from capturing its scope, in particular, stops it from accessing any builtins. This is caused because for some reason I thought we needed to unpickle __dict__ as {}; after testing, it seems this is not necessary, so this should be an easy fix.

Hidden tests

We should support these, probably via an aga_hidden argument to test_case.`

Create test cases from generators

It would be very convenient to create many test cases simultaneously from iterators. Here's an idea for an API:

test_cases(*args, aga_output = None, aga_product = True, aga_squash = False, **kwargs)

The user provides an iterator for each of the *args.

  • If aga_product is true, we create one set of test inputs for each element of the Cartesian product of each iterator. Otherwise, we iterate through each iterator simultaneously, creating one set of test inputs for each iteration, and stopping (or erroring?) if their lengths differ.
  • If aga_squash is False, we create one test case for each set of test inputs; otherwise, we put them all in a single test case, and have that test case loop through each set of inputs.
  • aga_output (and any other aga_-prefixed kwargs) is passed directly to test_case. It should also be a generator and probably only makes sense if aga_product is False, or maybe we just don't want to allow parameterized golden tests like this.

Extra-credit

Support test cases being worth extra credit which does not detract from the
total score available.

Better kwarg support

It should be possible to have almost-arbitrary kwargs as arguments to test_case. Probably all of our kwargs should be prefixed with aga_, and we should just reserve all such keywords, and allow the user to define others as test inputs.

Error on invalid aga_values

We should error at some point if the sum of values of each group or each
test case is greater than that allocated to the problem or group.

Floating-point comparison

Right now we use TestCase.assertEqual, which is not correct for floats. We
should support assertAlmostEqual.

Fix cli docs header levels

Currently, the "Problem Discovery" header is a higher level than the "aga" header autogenerated by sphinx-click, which is not the right behavior.

Handle student-facing incorrect symbol errors

Right now, certain errors as raised by loader.load_symbol_from_dir will just bubble up in the gradescope environment and probably stop the autograder run from working at all. We should have better handling for this kind of case.

This seems like at least it should handle:

  • SubmissionSyntaxError
  • NoMatchingSymbol
  • TooManyMatchingSymbols

Capture output

We should have a capture_output decorator which converts a procedure which prints to stdout into a function which returns a str containing whatever the wrapped function converted to stdout. Probably this needs to happen after the Problem is created, such that we can do the same for the student submission.

Extract gradescope resources into separate library

It's kind of awkward that these are currently exposed from our user-facing library. This would include the test runner at minimum; I think we probably need to ship copies of run_autograder and setup.sh, but those can pull in gradescope_main from a aga-gradescope-backend library or similar instead of pulling in aga itself.

Custom python version

It would be nice to be able to specify python versions to run tests against in the gradescope environment, so that student code that relies on >3.6 features won't fail. This might require writing our own docker container to run tests in.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.