Giter Club home page Giter Club logo

py-irt's People

Contributors

charchit7 avatar dependabot[bot] avatar entilzha avatar jplalor avatar leo-ware avatar pk1130 avatar r-b-g-b avatar zouharvi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

py-irt's Issues

One Question in example_rps.py

Could you please explain what 'response' means in this example? We are planning to apply this model to a QA mission. Is 'response' the correctness whether the model succeeds in identifying the item or whether the IRT model succeeds in telling the matching relationship last mentioned?

Also, it couldn't be better if there is some related documentations here.

Thanks a lot and look forward to your reply!

`py-irt: command not found` even after doing pip install --upgrade py-irt

Hey @EntilZha and @jplalor! Just saw that eval has been implemented :) Thanks a lot! I know that @jplalor had already published version 0.2.1 and done the poetry build required for py-irt to be used as a command in bash. For some reason, when I try to upgrade the package on pip, it doesn't recognize that there is a new version of the package available. To summarize the issue:

  1. What I tried to run: Tried to run pip install --upgrade py-irt. Nothing gets upgraded; against all dependencies, I get an output of Requirement already satisfied in ........ Uninstalling and reinstalling does not help. To confirm that an issue exists, I tried uninstalling the package and downloading intermediate versions by explicitly running pip install py-irt==0.2.1 and pip install py-irt==0.3.0. Bash output shows ERROR: Could not find a version that satisfies the requirement py-irt==0.2.1 and ERROR: Could not find a version that satisfies the requirement py-irt==0.3.0.
  2. What I expected to happen: I expected the py-irt package to be bumped up from the current version on my machine (version 0.1.1) to 0.3.0 (latest release about 40 minutes ago).
  3. What actually happened: No upgrades occurred. Still stuck on py-irt version 0.1.1.
  4. Python version: 3.6.9. pip version: 21.1.3. All installations done inside virtual environment created using venv.

Could one of you please look into what the issue might be and why the versions are not actually available using pip? Is it just me or are other people running into this issue as well? Please respond at your earliest convenience! Thanks a ton!

Some Requests

Could tell me if the 1PL model uses the EM algorithm for parameter estimation. Also, is it possible to estimate parameters with my observational data if it is transformed into non-binary data? Thank you very much for your help.

3pl predict function returns predictions with values that exceed 1

The predict function in three_param_logistic.py returns:
return lambdas + (1 - lambdas / (1 + np.exp(-discs * (abilities - diffs))))
Which will return predicted scores ranging from 1-2. This contrasts with the 2pl model which appears to correctly return predictions ranging from 0-1. I believe this is missing some parentheses and should be:
return lambdas + ((1 - lambdas) / (1 + np.exp(-discs * (abilities - diffs))))
to match the formula here: https://en.wikipedia.org/wiki/Item_response_theory#Three_parameter_logistic_model

Evaluating a trained IRT model using `eval` in cli.py

Hey @EntilZha and @jplalor! Just saw that cli.py has an eval method that has not been implemented yet. Just wanted to raise an issue regarding the same since we wanted to know how to evaluate a trained IRT model. As per our conversation in issue #10, we have successfully finished training an IRT model on the scored predictions of the SQuAD dataset. How can we evaluate the trained model and explore the types of questions that are hardest/easiest/ambiguous for the model to correctly answer? Please respond at your earliest convenience!

Thanks!

What is `response` in `model_predictions.jsonlines`?

Hey @EntilZha @jplalor! I was looking at model_predictions.jsonlines to interpret the evaluation after training and evaluating a 4PL IRT model on squad-pyirt.jsonlines. I understand that the prediction field in model_predictions.jsonlines is the probability that the model with id subject_id got the question with id example_id correct, calculated using the 4PL mathematical formula. What does the response field in the same file refer to? Does it refer to whether the model actually got the answer to the question correct? I am attaching a screenshot of the table generated below for better understanding. Please respond at your earliest convenience! Thanks!

image

Isn't the 3pl model wrong?

Shouldn't it be dist.Bernoulli(probs=lambdas[items]+(1-lambdas[items]) * p_star), uinstead of dist.Bernoulli(probs=(1-lambdas[items]) * p_star),

ValueError: Options for priors are vage and hierarchical

cli.evaluate(
  model_type='1pl',
  parameter_path='train-eval-1pl/best_parameters.json',
  test_pairs_path='test_pairs.jsonlines',
  output_dir='test-1pl')

Running the above evaluation code gives the error ValueError: Options for priors are vague and hierarchical.

  • 'train-eval-1pl/best_parameters.json' was trained with minitest.jsonlines
  • 'test_pairs.jsonlines' consist of the example data as shown:
    {"subject_id": "ken", "item_id": "q1"}
    {"subject_id": "ken", "item_id": "q2"}
    {"subject_id": "burt", "item_id": "q1"}
    {"subject_id": "burt", "item_id": "q3"}

Ability for subject with all corrct reponse (1) in 2 PL comes negative (lowest) and highest for subject with all incorrect response (0)

After training 2 pl irt model on synthetic data, the ability of subjects with all responses correct (1) is lowest (negative) and for the subject with all response incorrect (0) is highest (positive). On training multiple times sometimes this reverses, thus there is inconsistency in the ability scale.

How to make the ability scale consistent? i.e. subject with all responses correct should always have the highest (positive) ability.

Experiment setup:

  1. Generated synthetic data with subject id form user_0 to user_49 with following conditions:
    - first user has all responses = 1
    - last user has all responses = 0
    - other users have random responses {0, 1}
  2. Trained 2 PL model using this:
    config = IrtConfig(model_type="2pl", initializers=["difficulty_sign"], epochs=5000, lr=0.1, priors="hierarchical")
    trainer = IrtModelTrainer(config=config, data_path="/content/data.jsonlines")
    trainer.train(device="cpu")
    item_correctness = trainer._dataset.get_item_accuracies()
    summary = trainer.best_params

Code to replicate the issue: COLAB

Let me know if there is something wrong in training the model

3pl and 4pl parameter instability

Repeated runs of train-and-evaluate produce unstable estimates of item difficulty and discriminability. Lambdas (i.e. the guessing coefficients?) are unreasonably high (i.e. approaching 1).

I ran train-and-evaluate using the '4pl' model 5 times on the same dataset. These data are constituted by ~1700 subjects responding to 20 items. Presumably this is a large enough dataset to expect stable IRT model parameter estimates.

Correlations between estimates of item difficulty from run to run are as follows:

r Run2 Run3 Run4 Run5
Run1 0.10 0.30 0.32 -0.52
Run2 -0.26 -0.30 0.00
Run3 0.99 -0.54
Run4 -0.58

Similar issues appeared when running the 3pl. However, the 2pl model appears to be stable, with correlations between run-to-run item difficulty estimates ranging from 0.89 to 0.99.

Clean up requirements

Pip installation is slow at the moment. It would be nice to be more explicit with requirements so that pip doesn't have to do dependency resolution, which is slow.

Models used to get SQuAD data mentioned in former issues

Hi! I am trying to apply other datasets to this py-irt model, and I noticed that your SQuAD dataset had been converted to jsonlines with fields 'subjects' and 'responses'. So could I ask which models were used to generate these responses? Or any models are okay as long as the format is the same as SQuAD you offered? Thanks in advance!

py-irt: command not found

Hi @EntilZha and @jplalor,

I want to report a potential problem with py-irt.

I tried with python3.10 and got the py-irt: command not found.

To reproduce:

python3.10 -m venv venv
source venv/bin/activate
pip install --upgrade pip
pip install torch pyro-ppl py-irt

py-irt

Then, I got bash: py-irt: command not found.

Related: #17

Line 144 of cli.py's `evaluate` function throws an error when tested with scored predictions on squad data

Hey @jplalor @EntilZha! When testing all the functions written in cli.py with the scored predictions of the squad data, train and train_and_evaluate work well producing the desired training and evaluation results. But when testing the evaluate function separately by passing in the squad.jsonlines file for TEST_PAIRS_PATH, the function throws the following error on Line 144:

KeyError: item_id is not a key in the dict

Am I passing in the wrong file into TEST_PAIRS_PATH? Have also opened issue #18 to clarify what TEST_PAIRS_PATH refers to. Please respond at your earliest convenience. Thanks a lot!

Request for Multidimensional IRT

Hey @jplalor @EntilZha!

We are working on multidimensional IRT. Could you please add commits for related to multidimensional IRT if convenient? It will be extremely helpful for our next-step heuristics ideas! Thanks in advance.

Question about input and output of example_with_rps

I have two question to ask and I'm expecting your kind answer.
1.Could you please tell me about the meaning of your input of the example_with_rps? specifically, the meaning of modelID,itemID, and response?
2.And also may I ask what is the meaning of the output of example_with_rps.py?
Thank you very much!

Fitting an IRT model on the SQuAD dataset

Thanks for your responses to issue #9 @EntilZha and @jplalor! We saw the description underneath the from_jsonlines() function, but are still unsure of how to convert the SQuAD dataset into the format required for the function (since the function description asks for the dataset to be in a specific format and I'm not sure if extracting just QA pairs and their corresponding labels from the SQuAD dataset would be enough). Could one of you please do a small test run with maybe 5 rows from the SQuAD dataset as an example?

Null values in data

IRT is known for being able to incorporate null values, given that some respondents do not provide their answers. Does your package allow this in any way (etc., masking or dataset configuration)?

About MIRT value

Hi @EntilZha @jplalor

I have a question about MIRT return value.

I check the py-irt codes and now find what we consider as ‘disc’, ‘diff’, ‘fea’ are exactly the same in mirt and sirt parts. The two models both save the ‘loc_xxxx’. The only difference is that in mirt model, you set a two dimension array(num_subjects and dims) as these parameters, while in sirt, only a single digit is used(num_subjects).

So what does the 'dims' mean and how can I get the real MIRT value?

Thanks a lot in advance! Look forward to your reply!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.