We need to automate our testing, which is currently happening through google sheets.
An admin can create a TestSuite
, and TestQuestion
s under it. The TestSuite
can be configured to have different temperature
and topk
A TestQuestion
will contain a question
that will be asked to Ayushma, and an answer
that will have a human entered answer for the question. We need to test if Ayushma's response is similar to the answer
and how similar.
Once the admin triggers a test run, a new TestRun
instance will be created linked to a Project
and the TestSuite
. The suite will run async through celery.
The test will perform each associated TestQuestion
and create a TestResult
which will have the question
and human_answer
from the TestQuestion
(do not link the models, because the questions can change), answer
that was returned by Ayushma, the timetaken
, cosine_sim
and bleu_score
Once all questions have been answered, we need to calculate the cosign sim and bleu score for the result on the scale of 0 to 1. After this, the test will be over and the admin can see the results.
Now, update the user model to have an is_reviewer
field. If a user has is_reviewer
to be true, they can access the TestRun
s and their TestResult
s and add feedback to the results. They will be able to create a new Feedback
for a TestResult
by entering a rating
(Excellent, Good, Satisfactory, Unsatisfactory, Wrong or Hallucinating) and a note
.
The Reviewer can only see their own feedback. They cannot edit them later.
Only the Admins can see the Feedback
s of all reveiwers.
In the end, your new models should look like this (Models will be extending the base model class)
TestSuite
TestQuestion
test_suite
(fk)
question
human_answer
TestRun
test_suite
(fk)
project
complete
(default false)
TestResult
test_run
(fk)
test_question
(fk)
question
human_answer
answer
cosine_sim
bleu_score
Feedback
test_result
(fk)
rating
(integer choice field)
notes
cc. @bodhish