Giter Club home page Giter Club logo

clin-summ's Introduction

Clinical Text Summarization by Adapting LLMs | Nature Medicine

Official implementation from Stanford University

Datasets

We use six pre-existing open-source datasets which are publicly accessible at the sources cited in our manuscript. Additionally, for datasets which do not require PhysioNet access, we provide our versions in data/:

  • opi: Open-i (radiology reports)
  • chq: MeQSum (patient/consumer health questions)
  • d2n: ACI-Bench (dialogue)

Models

In addition to proprietary models GPT-3.5 and GPT-4, we adapt the following open-source models available from HuggingFace:

Code

Set-up

  1. Use these commands to set up a conda environment:
conda env create -f env.yml
conda activate clin-summ 
  1. In src/constants.py, create your own project directory DIR_PROJECT outside this repository which will contain input data, trained models, and generated output.
  2. Move input data from this repo to DIR_PROJECT, i.e. mv data/ DIR_PROJECT
  3. (optional) To add your own dataset, follow the format of example datasets opi, chq, and d2n in DIR_PROJECT/data/

Usage

Below is a description of relevant scripts:

  • ./main.sh: Fine-tune open-source models, query, and compute metrics
  • python api/main.py: Query OpenAI models and compute metrics
    • first enter information for your Azure deployment in src/constants.py via RESOURCE and API_KEY
  • python src/gen_faiss_idx.py: (new datasets only) Determine set of nearest neighbors training examples for each sample. Alternatively you can sample training examples at random.
  • src/UMLSScorer.py: Class definition for the MEDCON metric. To implement, follow these steps:
    1. Acquire approval for a UMLS license
    2. Follow the UMLS download instructions
    3. Adapt the provided script, src/UMLSScorer.py
    4. Call using the following two lines:
      • scorer = UMLSScorer()
      • medcon_score = scorer(string1, string2)

Citation

@article{vanveen2024clinical,
  title={Adapted Large Language Models Can Outperform Medical Experts in Clinical Text Summarization},
  author={Van Veen, Dave and Van Uden, Cara and Blankemeier, Louis and Delbrouck, Jean-Benoit and Aali, Asad and Bluethgen, Christian and Pareek, Anuj and Polacin, Malgorzata and Collins, William and Ahuja, Neera and Langlotz, Curtis P. and Hom, Jason and Gatidis, Sergios and Pauly, John and Chaudhari, Akshay S.},
  journal={Nature Medicine},
  year={2024},
  doi={10.1038/s41591-024-02855-5},
  url={https://doi.org/10.1038/s41591-024-02855-5},
  published={27 February 2024}
}

clin-summ's People

Contributors

davevanveen avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.