Giter Club home page Giter Club logo

alvi-khan / banglachq-summ Goto Github PK

View Code? Open in Web Editor NEW
2.0 2.0 1.0 14.01 MB

The data and code of 'BanglaCHQ-Summ: An Abstractive Summarization Dataset for Medical Queries in Bangla Conversational Speech', published in the Proceedings of the First Workshop on Bangla Language Processing, EMNLP 2023.

Home Page: https://aclanthology.org/2023.banglalp-1.10/

License: Other

Python 73.45% Jupyter Notebook 26.55%
bangla-nlp bioinformatics natural-language-processing summarization

banglachq-summ's Introduction

BanglaCHQ-Summ: An Abstractive Summarization Dataset for Medical Queries in Bangla Conversational Speech

This repository contains the data and code of 'BanglaCHQ-Summ: An Abstractive Summarization Dataset for Medical Queries in Bangla Conversational Speech', published in the Proceedings of the First Workshop on Bangla Language Processing (BLP-2023), co-located with The 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP).

All of the code and the dataset are being provided under the CC BY-NC-SA 4.0 License. The code provided can be used to re-create the data preparation process we followed, as well as the training and evaluation of the benchmark models.

The repository is divided into the following sections:

  • Data Preparation: Contains the scripts used to collect, clean and prepare the final dataset. Extensive details are provided in this README file.

  • Dataset: Contains the final release of the BanglaCHQ-Summ dataset.

  • Scripts: Contains the scripts used for training and evaluating the benchmark models.

  • Graphics: Contains the scripts used to create the graphs presented in the paper.

The fine-tuned model weights can be found under the releases section.

Citation

@inproceedings{khan-etal-2023-banglachq,
    title     = "{B}angla{CHQ}-Summ: An Abstractive Summarization Dataset for Medical Queries in {B}angla Conversational Speech",
    author    = "Khan, Alvi and Kamal, Fida and Chowdhury, Mohammad Abrar and Ahmed, Tasnim and Laskar, Md Tahmid Rahman and Ahmed, Sabbir",
    editor    = "Alam, Firoj and Kar, Sudipta and Chowdhury, Shammur Absar and Sadeque, Farig and Amin, Ruhul",
    booktitle = "Proceedings of the First Workshop on Bangla Language Processing (BLP-2023)",
    month     = dec,
    year      = "2023",
    address   = "Singapore",
    publisher = "Association for Computational Linguistics",
    url       = "https://aclanthology.org/2023.banglalp-1.10",
    doi       = "10.18653/v1/2023.banglalp-1.10",
    pages     = "85--93",
    abstract  = "Online health consultation is steadily gaining popularity as a platform for patients to discuss their medical health inquiries, known as Consumer Health Questions (CHQs). The emergence of the COVID-19 pandemic has also led to a surge in the use of such platforms, creating a significant burden for the limited number of healthcare professionals attempting to respond to the influx of questions. Abstractive text summarization is a promising solution to this challenge, since shortening CHQs to only the information essential to answering them reduces the amount of time spent parsing unnecessary information. The summarization process can also serve as an intermediate step towards the eventual development of an automated medical question-answering system. This paper presents {`}BanglaCHQ-Summ{'}, the first CHQ summarization dataset for the Bangla language, consisting of 2,350 question-summary pairs. It is benchmarked on state-of-the-art Bangla and multilingual text generation models, with the best-performing model, BanglaT5, achieving a ROUGE-L score of 48.35{\%}. In addition, we address the limitations of existing automatic metrics for summarization by conducting a human evaluation. The dataset and all relevant code used in this work have been made publicly available.",
}

banglachq-summ's People

Contributors

alvi-khan avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

Forkers

jesiara

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.