nd-ball / py-irt Goto Github PK

View Code? Open in Web Editor NEW

116.0 116.0 44.0 1.12 MB

Bayesian IRT models in Python

License: MIT License

Python 100.00%

py-irt's People

Contributors

Stargazers

Watchers

py-irt's Issues

Clarification on 1PL's guide scale value

hey, I just wanted to check if there was any particular reason for using 1.0e3 for the scale factor instead of the standard normal params?

py-irt/py_irt/models/one_param_logistic.py

Lines 57 to 63 in d2a27dd

 with pyro.plate("bs", self.num_items, device=self.device): 

 diff = pyro.sample( 

 "b", 

 dist.Normal( 

 torch.tensor(0.0, device=self.device), torch.tensor(1.0e3, device=self.device) 

 ), 

 )

One Question in example_rps.py

Could you please explain what 'response' means in this example? We are planning to apply this model to a QA mission. Is 'response' the correctness whether the model succeeds in identifying the item or whether the IRT model succeeds in telling the matching relationship last mentioned?

Also, it couldn't be better if there is some related documentations here.

Thanks a lot and look forward to your reply!

3PL model - predict function missing

`py-irt: command not found` even after doing pip install --upgrade py-irt

Hey @EntilZha and @jplalor! Just saw that eval has been implemented :) Thanks a lot! I know that @jplalor had already published version 0.2.1 and done the poetry build required for py-irt to be used as a command in bash. For some reason, when I try to upgrade the package on pip, it doesn't recognize that there is a new version of the package available. To summarize the issue:

What I tried to run: Tried to run pip install --upgrade py-irt. Nothing gets upgraded; against all dependencies, I get an output of Requirement already satisfied in ........ Uninstalling and reinstalling does not help. To confirm that an issue exists, I tried uninstalling the package and downloading intermediate versions by explicitly running pip install py-irt==0.2.1 and pip install py-irt==0.3.0. Bash output shows ERROR: Could not find a version that satisfies the requirement py-irt==0.2.1 and ERROR: Could not find a version that satisfies the requirement py-irt==0.3.0.
What I expected to happen: I expected the py-irt package to be bumped up from the current version on my machine (version 0.1.1) to 0.3.0 (latest release about 40 minutes ago).
What actually happened: No upgrades occurred. Still stuck on py-irt version 0.1.1.
Python version: 3.6.9. pip version: 21.1.3. All installations done inside virtual environment created using venv.

Could one of you please look into what the issue might be and why the versions are not actually available using pip? Is it just me or are other people running into this issue as well? Please respond at your earliest convenience! Thanks a ton!

Some Requests

Could tell me if the 1PL model uses the EM algorithm for parameter estimation. Also, is it possible to estimate parameters with my observational data if it is transformed into non-binary data? Thank you very much for your help.

'TwoParamLog' object has no attribute 'fit'

When I run the example codes provided, I get this error.

3pl predict function returns predictions with values that exceed 1

The predict function in three_param_logistic.py returns:
return lambdas + (1 - lambdas / (1 + np.exp(-discs * (abilities - diffs))))
Which will return predicted scores ranging from 1-2. This contrasts with the 2pl model which appears to correctly return predictions ranging from 0-1. I believe this is missing some parentheses and should be:
return lambdas + ((1 - lambdas) / (1 + np.exp(-discs * (abilities - diffs))))
to match the formula here: https://en.wikipedia.org/wiki/Item_response_theory#Three_parameter_logistic_model

What does `TEST_PAIRS_PATH` refer to in the `evaluate` function in cli.py?

Title.

Sidenote: First line of the evaluate function also has a console.log() that contains the data_path variable when it's not passed into the function. Since data_path is not being used in the function, that might have to be commented out or removed.

Thanks!

Theta inference for 2PL and 3PL

scoring.py has theta inference for 1PL models (https://github.com/nd-ball/py-irt/blob/master/py_irt/scoring.py). Let's add scoring for 2PL and 3PL when parameters are known.

3PL prior options not correct

py-irt/py_irt/models/three_param_logistic.py

Line 36 in c2d8f73

raise ValueError("Options for priors are vague and hierarchical")

Only available prior for 3PL is hierarchical.

Evaluating a trained IRT model using `eval` in cli.py

Hey @EntilZha and @jplalor! Just saw that cli.py has an eval method that has not been implemented yet. Just wanted to raise an issue regarding the same since we wanted to know how to evaluate a trained IRT model. As per our conversation in issue #10, we have successfully finished training an IRT model on the scored predictions of the SQuAD dataset. How can we evaluate the trained model and explore the types of questions that are hardest/easiest/ambiguous for the model to correctly answer? Please respond at your earliest convenience!

Thanks!

Let argument can be pandas DataFrame

Currently the argument has to be a list of dict.
Is it possable to have imput type dataframe of pandas?
Just like mirt package in R.

What is `response` in `model_predictions.jsonlines`?

Hey @EntilZha @jplalor! I was looking at model_predictions.jsonlines to interpret the evaluation after training and evaluating a 4PL IRT model on squad-pyirt.jsonlines. I understand that the prediction field in model_predictions.jsonlines is the probability that the model with id subject_id got the question with id example_id correct, calculated using the 4PL mathematical formula. What does the response field in the same file refer to? Does it refer to whether the model actually got the answer to the question correct? I am attaching a screenshot of the table generated below for better understanding. Please respond at your earliest convenience! Thanks!

Isn't the 3pl model wrong?

Shouldn't it be dist.Bernoulli(probs=lambdas[items]+(1-lambdas[items]) * p_star), uinstead of dist.Bernoulli(probs=(1-lambdas[items]) * p_star),

Setting difficulties manually

I would like to train a model using difficulties that were manually preset. Is that possible?

ValueError: Options for priors are vage and hierarchical

cli.evaluate(
  model_type='1pl',
  parameter_path='train-eval-1pl/best_parameters.json',
  test_pairs_path='test_pairs.jsonlines',
  output_dir='test-1pl')

Running the above evaluation code gives the error ValueError: Options for priors are vague and hierarchical.

'train-eval-1pl/best_parameters.json' was trained with minitest.jsonlines
'test_pairs.jsonlines' consist of the example data as shown:
{"subject_id": "ken", "item_id": "q1"}
{"subject_id": "ken", "item_id": "q2"}
{"subject_id": "burt", "item_id": "q1"}
{"subject_id": "burt", "item_id": "q3"}

multidim_2pl doesn't have a predict method

Unless I'm missing something?

Ability for subject with all corrct reponse (1) in 2 PL comes negative (lowest) and highest for subject with all incorrect response (0)

After training 2 pl irt model on synthetic data, the ability of subjects with all responses correct (1) is lowest (negative) and for the subject with all response incorrect (0) is highest (positive). On training multiple times sometimes this reverses, thus there is inconsistency in the ability scale.

How to make the ability scale consistent? i.e. subject with all responses correct should always have the highest (positive) ability.

Experiment setup:

Generated synthetic data with subject id form user_0 to user_49 with following conditions:
- first user has all responses = 1
- last user has all responses = 0
- other users have random responses {0, 1}
Trained 2 PL model using this:
config = IrtConfig(model_type="2pl", initializers=["difficulty_sign"], epochs=5000, lr=0.1, priors="hierarchical")
trainer = IrtModelTrainer(config=config, data_path="/content/data.jsonlines")
trainer.train(device="cpu")
item_correctness = trainer._dataset.get_item_accuracies()
summary = trainer.best_params

Code to replicate the issue: COLAB

Let me know if there is something wrong in training the model

3pl and 4pl parameter instability

Repeated runs of train-and-evaluate produce unstable estimates of item difficulty and discriminability. Lambdas (i.e. the guessing coefficients?) are unreasonably high (i.e. approaching 1).

I ran train-and-evaluate using the '4pl' model 5 times on the same dataset. These data are constituted by ~1700 subjects responding to 20 items. Presumably this is a large enough dataset to expect stable IRT model parameter estimates.

Correlations between estimates of item difficulty from run to run are as follows:

r Run2 Run3 Run4 Run5
Run1 0.10 0.30 0.32 -0.52
Run2 -0.26 -0.30 0.00
Run3 0.99 -0.54
Run4 -0.58

Similar issues appeared when running the 3pl. However, the 2pl model appears to be stable, with correlations between run-to-run item difficulty estimates ranging from 0.89 to 0.99.

AttributeError: module 'pyro.optim' has no attribute 'ExponentialLR'

when I try this code: https://github.com/nd-ball/py-irt/blob/master/examples/py-irt_example.ipynb
An error occurred, prompting: AttributeError: module 'pyro.optim' has no attribute 'ExponentialLR' Can you tell me how to solve this problem? My version of Pytorch is 2.0.0+cuda11.7

Clean up requirements

Pip installation is slow at the moment. It would be nice to be more explicit with requirements so that pip doesn't have to do dependency resolution, which is slow.

cli.py train_and_evaluate outputs

Hey guys,

Is there a reason why example_id is being commented out from the output? How would I interpret the result of the 1-PL evaluation?

https://github.com/nd-ball/py-irt/blob/ddd8783e8085b07b0f5b571587f1e373aafa36c5/py_irt/cli.py#LL232C1-L241C74

Models used to get SQuAD data mentioned in former issues

Hi! I am trying to apply other datasets to this py-irt model, and I noticed that your SQuAD dataset had been converted to jsonlines with fields 'subjects' and 'responses'. So could I ask which models were used to generate these responses? Or any models are okay as long as the format is the same as SQuAD you offered? Thanks in advance!

py-irt: command not found

Hi @EntilZha and @jplalor,

I want to report a potential problem with py-irt.

I tried with python3.10 and got the py-irt: command not found.

To reproduce:

python3.10 -m venv venv
source venv/bin/activate
pip install --upgrade pip
pip install torch pyro-ppl py-irt

py-irt

Then, I got bash: py-irt: command not found.

Related: #17

Line 144 of cli.py's `evaluate` function throws an error when tested with scored predictions on squad data

Hey @jplalor @EntilZha! When testing all the functions written in cli.py with the scored predictions of the squad data, train and train_and_evaluate work well producing the desired training and evaluation results. But when testing the evaluate function separately by passing in the squad.jsonlines file for TEST_PAIRS_PATH, the function throws the following error on Line 144:

KeyError: item_id is not a key in the dict

Am I passing in the wrong file into TEST_PAIRS_PATH? Have also opened issue #18 to clarify what TEST_PAIRS_PATH refers to. Please respond at your earliest convenience. Thanks a lot!

Request for Multidimensional IRT

Hey @jplalor @EntilZha!

We are working on multidimensional IRT. Could you please add commits for related to multidimensional IRT if convenient? It will be extremely helpful for our next-step heuristics ideas! Thanks in advance.

Question about input and output of example_with_rps

I have two question to ask and I'm expecting your kind answer.
1.Could you please tell me about the meaning of your input of the example_with_rps? specifically, the meaning of modelID，itemID, and response?
2.And also may I ask what is the meaning of the output of example_with_rps.py?
Thank you very much!

Fitting an IRT model on the SQuAD dataset

Thanks for your responses to issue #9 @EntilZha and @jplalor! We saw the description underneath the from_jsonlines() function, but are still unsure of how to convert the SQuAD dataset into the format required for the function (since the function description asks for the dataset to be in a specific format and I'm not sure if extracting just QA pairs and their corresponding labels from the SQuAD dataset would be enough). Could one of you please do a small test run with maybe 5 rows from the SQuAD dataset as an example?

Null values in data

IRT is known for being able to incorporate null values, given that some respondents do not provide their answers. Does your package allow this in any way (etc., masking or dataset configuration)?

About MIRT value

Hi @EntilZha @jplalor

I have a question about MIRT return value.

I check the py-irt codes and now find what we consider as ‘disc’, ‘diff’, ‘fea’ are exactly the same in mirt and sirt parts. The two models both save the ‘loc_xxxx’. The only difference is that in mirt model, you set a two dimension array(num_subjects and dims) as these parameters, while in sirt, only a single digit is used(num_subjects).

So what does the 'dims' mean and how can I get the real MIRT value?

Thanks a lot in advance! Look forward to your reply!

	with pyro.plate("bs", self.num_items, device=self.device):
	diff = pyro.sample(
	"b",
	dist.Normal(
	torch.tensor(0.0, device=self.device), torch.tensor(1.0e3, device=self.device)
	),
	)

nd-ball / py-irt Goto Github PK

py-irt's People

Contributors

Stargazers

Watchers

Forkers

py-irt's Issues

Recommend Projects

Recommend Topics

Recommend Org