rileyshahar / aga Goto Github PK

View Code? Open in Web Editor NEW

8.0 8.0 2.0 432 KB

aga grades assignments

License: MIT License

Python 99.65% Shell 0.35%

aga's People

Stargazers

Watchers

Forkers

dylanmc flickersoul

aga's Issues

Handle mutability

There are a family of issues related to mutability:

We should have some ability to ensure that submitted code does or does not mutate its inputs, and inspect the mutation if it does mutate them.
We should deepcopy inputs when passing them to student submissions, because those submissions may mutate the arguments, causing the golden solution to produce incorrect output.

Support hypothesis

We could use "the solution outputs the same value as the reference solution" as the property to be tested against.

We should fix the coverage data we collect to take into account the
configuration in pyproject.toml. It looks like the problem has to do with
nox, becuse this information is taken into account when running pytest --cov
by hand.

Add a --version CLI argument

Having this is just good practice.

Add tests with aga/runner

Handle exceptions in CLI

Right now, the CLI does not properly handle a large number of errors, instead exposing the Python tracebacks. We should catch these errors and report reasonable error messages.

Multi-problem submissions

Currently each generated autograder zip only grades a single problem. It would
be convenient to generate zips on e.x. a per-file basis which grade multiple
functions placed in a single python file.

Use str instead of repr

_AutograderTestCase should use str(test_input) to display itself.

Determine total score

We can get the total points for an assignment from the assignment.total_points object in the submission_metadata.json file. We should use this to determine how many points the autograder gives out.

e2e testing

We should have e2e test cases which:

use the CLI to generate a zip
use gradescope's docker image to upload the zip to docker
simulate the work done by gradescope to run the autograder
inspect the resulting JSON

Here's a script which does some of this work that I've been using by hand:

#!/bin/bash

poetry build
zipfile=$(aga gen "$1")

docker run --rm -it                                             \
  -v "$PWD"/"$zipfile":/tmp/autograder.zip                      \
  -v "$PWD"/dist/aga-0.2.0.tar.gz:/autograder/aga/aga.tar.gz    \
  -v "$PWD"/"$2":/autograder/submission/"$1".py                 \
  gradescope/auto-builds bash -c "
bash -c '
set -e

apt-get update

apt-get install -y curl unzip dos2unix 

mkdir -p /autograder/source /autograder/results 

unzip -n -d /autograder/source /tmp/autograder.zip 

cp /autograder/source/run_autograder /autograder/run_autograder 

dos2unix /autograder/run_autograder /autograder/source/setup.sh 

chmod +x /autograder/run_autograder 
apt-get update 

bash /autograder/source/setup.sh 
apt-get clean 
rm -rf /var/lib/apt/lists/* /var/tmp/* 

/autograder/run_autograder 
cat /autograder/results/results.json
'
bash
"

Pickling a problem removes its context

Found by Jim when testing. This prevents it from capturing its scope, in particular, stops it from accessing any builtins. This is caused because for some reason I thought we needed to unpickle __dict__ as {}; after testing, it seems this is not necessary, so this should be an easy fix.

Better student-facing UI customization

For example, we should document aga_name.

Read submission_metadata.json

We should parse this so we can do useful things with the data.

Hidden tests

We should support these, probably via an aga_hidden argument to test_case.`

Create test cases from generators

It would be very convenient to create many test cases simultaneously from iterators. Here's an idea for an API:

test_cases(*args, aga_output = None, aga_product = True, aga_squash = False, **kwargs)

The user provides an iterator for each of the *args.

If aga_product is true, we create one set of test inputs for each element of the Cartesian product of each iterator. Otherwise, we iterate through each iterator simultaneously, creating one set of test inputs for each iteration, and stopping (or erroring?) if their lengths differ.
If aga_squash is False, we create one test case for each set of test inputs; otherwise, we put them all in a single test case, and have that test case loop through each set of inputs.
aga_output (and any other aga_-prefixed kwargs) is passed directly to test_case. It should also be a generator and probably only makes sense if aga_product is False, or maybe we just don't want to allow parameterized golden tests like this.

Extra-credit

Support test cases being worth extra credit which does not detract from the
total score available.

Division by zero in score determination algorithm

This happens if the values take up precisely all the score.

Better kwarg support

It should be possible to have almost-arbitrary kwargs as arguments to test_case. Probably all of our kwargs should be prefixed with aga_, and we should just reserve all such keywords, and allow the user to define others as test inputs.

Error on invalid aga_values

We should error at some point if the sum of values of each group or each
test case is greater than that allocated to the problem or group.

Floating-point comparison

Right now we use TestCase.assertEqual, which is not correct for floats. We
should support assertAlmostEqual.

Nox stops running lints after one fails

We might need to separate each linter into a separate session, not sure.

Fix cli docs header levels

Currently, the "Problem Discovery" header is a higher level than the "aga" header autogenerated by sphinx-click, which is not the right behavior.

Handle student-facing incorrect symbol errors

Right now, certain errors as raised by loader.load_symbol_from_dir will just bubble up in the gradescope environment and probably stop the autograder run from working at all. We should have better handling for this kind of case.

This seems like at least it should handle:

SubmissionSyntaxError
NoMatchingSymbol
TooManyMatchingSymbols

Capture output

We should have a capture_output decorator which converts a procedure which prints to stdout into a function which returns a str containing whatever the wrapped function converted to stdout. Probably this needs to happen after the Problem is created, such that we can do the same for the student submission.

Extract gradescope resources into separate library

It's kind of awkward that these are currently exposed from our user-facing library. This would include the test runner at minimum; I think we probably need to ship copies of run_autograder and setup.sh, but those can pull in gradescope_main from a aga-gradescope-backend library or similar instead of pulling in aga itself.

Custom python version

It would be nice to be able to specify python versions to run tests against in the gradescope environment, so that student code that relies on >3.6 features won't fail. This might require writing our own docker container to run tests in.

rileyshahar / aga Goto Github PK

aga's People

Stargazers

Watchers

Forkers

aga's Issues

Recommend Projects

Recommend Topics

Recommend Org