erisk2022-lausan's Introduction

eRisk2022-LauSAn

Setup

Install dependencies:
```
$ pip install -r requirements.txt
```
Copy data for task 2 into the data directory.

Training

We use two different train/test sets for evaluating our models before submission. To train on set 1:

$ python src/main.py train ModelType --data data/train_set_1.txt

By default, the model will be saved with a timestamp in the filename in the current working directory. Use python src/main.py train --help to see available model types and other options.

To optimize the threshold scheduler parameters, use:

$ python src/main.py optimize-threshold my_model.pickle --metrics erde5 erde50 latency-f1 --data data/train_set_1.txt

This will perform grid search over a range of parameter values defined in the model class and store the optimized model in a second file (by default, appending .optimized to the filename base).

To see information about a saved model, including optimized threshold scheduler parameters, use python src/main.py info my_model.pickle.

Testing

The final submission will work via a JSON API, so we use a local dummy API (running at http://localhost:5000) with the same interface to evaluate our results and test our submission client during development.

Run the local submission API, providing the test set and the number of runs as an argument. For example, using our test set 1:
```
$ python src/api.py --data data/test_set_1.txt --runs 2
```

Run the submission client (one model per run):

$ python src/main.py submit path/to/model1 path/to/model2 ...

Go to http://localhost:5000/results to analyze results.

Recommend Projects

saeub / erisk2022-lausan Goto Github PK