Giter Club home page Giter Club logo

mediqa-chat-2023's Introduction

ci License: MIT

MEDIQA-Chat-2023-WangLab

This repository contains our submission (and the resulting short paper) to the MEDIQA-Chat Shared Task @ ACL-ClinicalNLP 2023

Table of contents

Installation

Requires python>=3.8. First, create and activate a virtual environment, then install the requirements:

pip install -r requirements.txt

Note: For setup on a cluster managed by the Alliance, please see ./scripts/slurm/setup_on_arc.sh.

Usage

Fine-tuning a model on the shared task data

Models can be fine-tuned on the shared task data using the run_summarization.py script, which is adapted from the HuggingFace run_summarization.py script. To see all available options, run:

python ./scripts/run_summarization.py --help

Arguments can be modified in the config files or passed as command-line arguments. Valid arguments are anything from the HuggingFace TrainingArguments, Seq2SeqTrainingArguments or arguments specified in the script itself. At a minimum, you must provide a path to the dataset partitions with train_file, validation_file and, optionally, test_file.

Training

To train the model, run one of the following:

# Task A (train)
python ./scripts/run_summarization.py "./conf/base.yml" "./conf/taskA.yml" \
    output_dir="./output/taskA"
# Task B (train)
python ./scripts/run_summarization.py "./conf/base.yml" "./conf/taskB.yml" \
    output_dir="./output/taskB"

Note: base.yml contains good default arguments that should be used for all experiments. taskA.yml/taskB.yml contain arguments specific to Task A/B. Arguments passed via the command line arguments will override those in the config files.

Validation

To evaluate a trained model on the validation set, run one of the following:

# Task A
python ./scripts/run_summarization.py "./conf/base.yml" "./conf/taskA.yml" \
    output_dir="./output/taskA/fine_tune" \
    model_name_or_path="./path/to/model/checkpoint" \
    do_train=False \
    do_eval=True
# Task B
python ./scripts/run_summarization.py "./conf/base.yml" "./conf/taskB.yml" \
    output_dir="./output/taskB/fine_tune" \
    model_name_or_path="./path/to/model/checkpoint" \
    do_train=False \
    do_eval=True

Testing

To make predictions with a trained model on the test set, see the Submission.


By default, the model will be evaluated by ROUGE, BERTScore and BLEURT. You can change the underlying models for BERTScore and BLEURT by modifying the bertscore_model_type and bleurt_checkpoint arguments. We choose reasonable defaults here, which balance model size and evaluation time with automatic metric performance. For more information on possible models and metric performance, see here for BERTScore and here for BLEURT.

Results will be automatically logged to any integrations that are installed and supported by the HuggingFace trainer. If do_predict=True, a file which contains the model's predictions formatted for submission to the challenge task will be saved to output_dir / "taskX_wanglab_runY.csv". X corresponds to the script argument task and Y to the script argument run.

We also provide a SLURM submission script for ARC clusters, which can be found at ./scripts/slurm/run_summarization.sh.

Generate notes with LangChain

To generate notes with a large language model (LLM, via LangChain), use the run_langchain.py script. To see all available options, run:

python ./scripts/run_langchain.py --help

To reproduce our best results for Task B, run the following:

# Task B
OPENAI_API_KEY="..." python scripts/run_langchain.py \
    "./MEDIQA-Chat-TestSets-March-15-2023/TaskB/taskB_testset4participants_inputConversations.csv" \
    "./output/taskB/in_context_learning" \
    --train-fp "./MEDIQA-Chat-Training-ValidationSets-Feb-10-2023/TaskB/TaskB-TrainingSet.csv" \
    --task "B" \
    --run "1"

You will need to provide your own OPENAI_API_KEY.

Note: Due to the non-deterministic nature of OpenAI's models and API, results may vary slightly from our reported results.

Pre-trained models, outputs and results

All model outputs and results (as well as data from the human evaluation) reported in our paper are available in the data/paper directory.

Submitting to the shared task

To submit a run to the shared task we used the following commands:

./scripts/submission/install.sh
./scripts/submission/activate.sh
# Then, choose one of the decode scripts, e.g.
./scripts/submission/decode_taskA_run1.sh

The submission scripts also demonstrate how to make predictions on the test set using a trained model.

Citing

If you use our model in your work, please consider citing our paper:

@inproceedings{giorgi-etal-2023-wanglab,
	title        = {{W}ang{L}ab at {MEDIQA}-Chat 2023: Clinical Note Generation from Doctor-Patient Conversations using Large Language Models},
	author       = {Giorgi, John  and Toma, Augustin  and Xie, Ronald  and Chen, Sondra  and An, Kevin  and Zheng, Grace  and Wang, Bo},
	year         = 2023,
	month        = jul,
	booktitle    = {Proceedings of the 5th Clinical Natural Language Processing Workshop},
	publisher    = {Association for Computational Linguistics},
	address      = {Toronto, Canada},
	pages        = {323--334},
	url          = {https://aclanthology.org/2023.clinicalnlp-1.36},
	abstract     = {This paper describes our submission to the MEDIQA-Chat 2023 shared task for automatic clinical note generation from doctor-patient conversations. We report results for two approaches: the first fine-tunes a pre-trained language model (PLM) on the shared task data, and the second uses few-shot in-context learning (ICL) with a large language model (LLM). Both achieve high performance as measured by automatic metrics (e.g. ROUGE, BERTScore) and ranked second and first, respectively, of all submissions to the shared task. Expert human scrutiny indicates that notes generated via the ICL-based approach with GPT-4 are preferred about as often as human-written notes, making it a promising path toward automated note generation from doctor-patient conversations.}
}

mediqa-chat-2023's People

Contributors

johngiorgi avatar augustintoma avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.