rileyshahar / aga Goto Github PK
View Code? Open in Web Editor NEWaga grades assignments
License: MIT License
aga grades assignments
License: MIT License
Fairly self-explanatory.
There are a family of issues related to mutability:
We could use "the solution outputs the same value as the reference solution" as the property to be tested against.
We should fix the coverage data we collect to take into account the
configuration in pyproject.toml
. It looks like the problem has to do with
nox, becuse this information is taken into account when running pytest --cov
by hand.
Having this is just good practice.
Right now, the CLI does not properly handle a large number of errors, instead exposing the Python tracebacks. We should catch these errors and report reasonable error messages.
Currently each generated autograder zip only grades a single problem. It would
be convenient to generate zips on e.x. a per-file basis which grade multiple
functions placed in a single python file.
_AutograderTestCase
should use str(test_input)
to display itself.
We can get the total points for an assignment from the assignment.total_points object in the submission_metadata.json file. We should use this to determine how many points the autograder gives out.
We should have e2e test cases which:
Here's a script which does some of this work that I've been using by hand:
#!/bin/bash
poetry build
zipfile=$(aga gen "$1")
docker run --rm -it \
-v "$PWD"/"$zipfile":/tmp/autograder.zip \
-v "$PWD"/dist/aga-0.2.0.tar.gz:/autograder/aga/aga.tar.gz \
-v "$PWD"/"$2":/autograder/submission/"$1".py \
gradescope/auto-builds bash -c "
bash -c '
set -e
apt-get update
apt-get install -y curl unzip dos2unix
mkdir -p /autograder/source /autograder/results
unzip -n -d /autograder/source /tmp/autograder.zip
cp /autograder/source/run_autograder /autograder/run_autograder
dos2unix /autograder/run_autograder /autograder/source/setup.sh
chmod +x /autograder/run_autograder
apt-get update
bash /autograder/source/setup.sh
apt-get clean
rm -rf /var/lib/apt/lists/* /var/tmp/*
/autograder/run_autograder
cat /autograder/results/results.json
'
bash
"
Found by Jim when testing. This prevents it from capturing its scope, in particular, stops it from accessing any builtins. This is caused because for some reason I thought we needed to unpickle __dict__
as {}
; after testing, it seems this is not necessary, so this should be an easy fix.
For example, we should document aga_name
.
We should parse this so we can do useful things with the data.
We should support these, probably via an aga_hidden
argument to test_case
.`
It would be very convenient to create many test cases simultaneously from iterators. Here's an idea for an API:
test_cases(*args, aga_output = None, aga_product = True, aga_squash = False, **kwargs)
The user provides an iterator for each of the *args.
aga_product
is true, we create one set of test inputs for each element of the Cartesian product of each iterator. Otherwise, we iterate through each iterator simultaneously, creating one set of test inputs for each iteration, and stopping (or erroring?) if their lengths differ.aga_squash
is False, we create one test case for each set of test inputs; otherwise, we put them all in a single test case, and have that test case loop through each set of inputs.aga_output
(and any other aga_
-prefixed kwargs) is passed directly to test_case
. It should also be a generator and probably only makes sense if aga_product
is False, or maybe we just don't want to allow parameterized golden tests like this.Support test cases being worth extra credit which does not detract from the
total score available.
This happens if the values take up precisely all the score.
It should be possible to have almost-arbitrary kwargs as arguments to test_case
. Probably all of our kwargs should be prefixed with aga_
, and we should just reserve all such keywords, and allow the user to define others as test inputs.
We should error at some point if the sum of values of each group or each
test case is greater than that allocated to the problem or group.
Right now we use TestCase.assertEqual
, which is not correct for floats. We
should support assertAlmostEqual
.
We might need to separate each linter into a separate session, not sure.
Currently, the "Problem Discovery" header is a higher level than the "aga" header autogenerated by sphinx-click, which is not the right behavior.
Right now, certain errors as raised by loader.load_symbol_from_dir
will just bubble up in the gradescope environment and probably stop the autograder run from working at all. We should have better handling for this kind of case.
This seems like at least it should handle:
SubmissionSyntaxError
NoMatchingSymbol
TooManyMatchingSymbols
We should have a capture_output
decorator which converts a procedure which prints to stdout into a function which returns a str containing whatever the wrapped function converted to stdout. Probably this needs to happen after the Problem is created, such that we can do the same for the student submission.
It's kind of awkward that these are currently exposed from our user-facing library. This would include the test runner at minimum; I think we probably need to ship copies of run_autograder and setup.sh, but those can pull in gradescope_main
from a aga-gradescope-backend
library or similar instead of pulling in aga
itself.
It would be nice to be able to specify python versions to run tests against in the gradescope environment, so that student code that relies on >3.6 features won't fail. This might require writing our own docker container to run tests in.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.