Giter Club home page Giter Club logo

medicalresearchtextanalyser's Introduction

MedicalResearchTextAnalyser

A system for linguistic inquiry and text analysis of medical research using UMLS.

Getting Started

N.B. Tested on Linux (OpenSUSE Tumbleweed, Kernel 6.5.9-1)

Environment Setup

python -m venv .venv
# linux
source .venv/bin/activate
# windows
# .venv\Scripts\activate
pip install -r requirements.txt
python -m spacy download en_core_web_sm
jupyter lab MedicalResearchTextAnalyser.ipynb 

UMLS Setup

  1. Acquire a license and download the full UMLS resources.
  2. Install MetamorphoSys by following these instructions.
  3. Follow the QuickUMLS initialization process.

Features

Dataset Collection from Scopus

  • Articles search query:

    TITLE ( ( radiology OR radiologist ) ) 
    AND ( TITLE ( ( ( automatic OR automated ) AND report ) OR ( artificial AND intelligence AND report ) OR ( deep AND learning AND report ) OR ( natural AND language AND processing ) OR ( large AND language AND model ) ) OR TITLE-ABS-KEY (report AND ( ( automatic OR automated ) OR ( artificial AND intelligence AND report ) OR ( deep AND learning AND report ) OR ( natural AND language AND processing ) OR ( large AND language AND model ) OR ( information AND retrieval ) OR ( computational AND linguistics ) )) ) 
    AND ( EXCLUDE ( DOCTYPE , "re" ) ) 
    AND ( LIMIT-TO ( LANGUAGE , "English" ) )
  • Reviews search query:

    TITLE ( ( radiology OR radiologist ) ) 
    AND ( TITLE ( ( ( automatic OR automated ) AND report ) OR ( artificial AND intelligence AND report ) OR ( deep AND learning AND report ) OR ( natural AND language AND processing ) OR ( large AND language AND model ) ) OR TITLE-ABS-KEY (report AND ( ( automatic OR automated ) OR ( artificial AND intelligence AND report ) OR ( deep AND learning AND report ) OR ( natural AND language AND processing ) OR ( large AND language AND model ) OR ( information AND retrieval ) OR ( computational AND linguistics ) )) ) 
    AND ( LIMIT-TO ( DOCTYPE , "re" ) ) 
    AND ( LIMIT-TO ( LANGUAGE , "English" ) )

Scopus Search Query

NLP Pipeline

  • Data Preprocessing

    • Tokenization
    • Stopword Removal
    • Stemming
  • Topic Modelling and Correlation Analysis

    • Empath
    • TF-IDF
    • UMLS

Visualization

  • Word Clouds Scopus Search Query

  • Topic historgam Scopus Search Query

  • TF-IDF Charts Scopus Search Query

  • Similarity Matrix Heatmaps Scopus Search Query

  • GUI

Future Work

References

  • [1] Olivier Bodenreider. “The Unified Medical Language System (UMLS): integrating biomedica. terminology”. In: Nucleic Acids Research 32.suppl1 (Jan. 2004), pp. D267–D270. ISSN: 0305-1048. DOI: 10.1093/nar/gkh061. URL: https://doi.org/10.1093/nar/gkh061.
  • [2] H. Eyre et al. “Launching into clinical space with medspaCy: a new clinical text processing toolkit in Python”. In: AMIA Annu Symp Proc 2021 (2021), pp. 438–447.2
  • [3] Ethan Fast, Binbin Chen, and Michael S. Bernstein. “Empath: Understanding Topic Signals in Large-Scale Text”. In: Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. CHI ’16. San Jose, California, USA: Association for Computing Machin-ery, 2016, pp. 4647–4657. ISBN: 9781450333627. DOI: 10.1145/2858036.2858535. URL: https://doi.org/10.1145/2858036.2858535.
  • [4] Andrey Kormilitzin et al. “Med7: a transferable clinical natural language processing model for electronic health records”. In: arXiv preprint arXiv:2003.01271 (2020).
  • [5] Zeljko Kraljevic et al. “Multi-domain clinical natural language processing with MedCAT: The Medical Concept Annotation Toolkit”. In: Artif. Intell. Med. 117 (July 2021), p. 102083. ISSN: 0933-3657. DOI: 10.1016/j.artmed.2021.102083
  • [6] Ali Mozayan et al. “Practical Guide to Natural Language Processing for Radiology”. In: Ra-dioGraphics 41.5 (2021). PMID: 34469212, pp. 1446–1453. DOI: 10.1148/rg.2021200113. URL: https://doi.org/10.1148/rg.2021200113.
  • [7] Mark Neumann et al. “ScispaCy: Fast and Robust Models for Biomedical Natural Language Processing”. In: Proceedings of the 18th BioNLP Workshop and Shared Task. Florence, Italy: Association for Computational Linguistics, Aug. 2019, pp. 319–327. DOI: 10.18653/v1/W19-5034. URL: https://www.aclweb.org/anthology/W19-5034.
  • [8] Yifan Peng et al. NegBio: a high-performance tool for negation and uncertainty detection in radiology reports. 2017.
  • [9] Radim ˇReh ̊uˇrek and Petr Sojka. “Software Framework for Topic Modelling with Large Cor-pora”. English. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks. http://is.muni.cz/publication/884893/en. Valletta, Malta: ELRA, May 2010, pp. 45–50.
  • [10] Guergana K Savova et al. “Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications”. In: Journal of the American Medical Informatics Association 17.5 (Sept. 2010), pp. 507–513. ISSN: 1067-5027. DOI: 10.1136/jamia.2009.001560. URL: https://doi.org/10.1136/jamia.2009.001560.
  • [11] Luca Soldaini. “QuickUMLS: a fast, unsupervised approach for medical concept extraction”. In: 2016. URL: https://api.semanticscholar.org/CorpusID:2990304.
  • [12] Song Wang et al. Radiology Text Analysis System (RadText): Architecture and Evaluation. 2022.

medicalresearchtextanalyser's People

Contributors

husmen avatar

Stargazers

Andrej Kastrin avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.