Giter Club home page Giter Club logo

lay_summarisation's Introduction

laySummarisation

Project as part of COMP34812: Natural Language Understanding

Installation

Preferably use poetry. See install info here.

When using it with VSCode, or any IDE with local venv management, you should include the venv created by poetry in your local directory. To achieve this use the following command before installing:

poetry config virtualenvs.in-project true

Then use the following command to install all dependencies:

potetry install

For further info on managing packages etc, see the Poetry docs.

Data and weights

To access the internet from the CSF cluster, you need to use the proxy module:

module load tools/env/proxy2

Using the scripts in the scripts folder, you can automatically download and extract the data.

Run the following command to setup both data:

./scripts/setup_data.sh

If downloading the data does not work, download it manually and extract the contents to data/orig

https://drive.google.com/uc?id=1FFfa4fHlhEAyJZIM2Ue-AR6Noe9gOJOF&export=download

Running the code

First download the data and weights as described above.

Then run the following pre-processing code to pre-process the entire dataset:

./scripts/process_all.sh

Or run individually with ./scripts/data/elife.sh and ./scripts/data/plos.sh.

CSF

Look at the jobs folder.

Models

Extractor model (based on https://huggingface.co/emilyalsentzer/Bio_ClinicalBERT): https://drive.google.com/drive/folders/1w-LhpA1ek5V10wUImU0BiRhjNOjihZiQ?usp=sharing

GPT2 Model (based on https://huggingface.co/gpt2): https://livemanchesterac-my.sharepoint.com/:f:/g/personal/ahmed_soliman-2_student_manchester_ac_uk/EjcG0NcNbpRKoCvUUzzRCrwBKNE4RN-IQfh_ZUiVB3Tkvg?e=SOGPs1

Clinical Longformer model (based on https://huggingface.co/yikuan8/Clinical-Longformer): https://drive.google.com/drive/folders/1QtFVqKHtmj_T64Vanyyrm5mnigl7_TJ4?usp=sharing

ClinicalT5 model (based on https://physionet.org/content/clinical-t5/1.0.0/): https://livemanchesterac-my.sharepoint.com/:f:/g/personal/ahmed_soliman-2_student_manchester_ac_uk/EjcG0NcNbpRKoCvUUzzRCrwBKNE4RN-IQfh_ZUiVB3Tkvg?e=SOGPs1

Dataset

The datasets used was provided as part of the BioLaySum 2023 Challenge. The two datasets are academic articles from PLOS and eLife.

The data can be accessed at the following link: https://biolaysumm.org/

(If that link doesn't work check the google drive link above)

Data source:

[1] Tomas Goldsack, Zhihao Zhang, Chenghua Lin, Carolina Scarton. Making Science Simple: Corpora for the Lay Summarisation of Scientific Literature. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP 2022), Abu Dhabi. url

[2] Zheheng Luo, Qianqian Xie, Sophia Ananiadou. Readability Controllable Biomedical Document Summarization. Findings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP 2022 Findings), Abu Dhabi. url

Links

https://aclweb.org/aclwiki/BioNLP_Workshop

lay_summarisation's People

Contributors

ahmedsoliman360 avatar vladislavlhp7 avatar wenzlawski avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

jeevananandanne

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.