Giter Club home page Giter Club logo

ml_loinc_mapping's Introduction

ml_loinc_mapping

The ml_loinc_mapping module was created to map laboratory tests to their respective LOINC codes by building a model that leverages noisy labels in existing laboratory data.

Dependencies

Python

  • Python (>= 3.5)
  • numpy (>= 1.11.0)
  • scikit-learn (>= 0.17.1)
  • networkx=1.11
  • rpy2
  • hyperopt

R

  • stringdist (used in Python code for string distance matching via Python rpy2 package)

UMLS

  • User will need to have a UMLS username/password login to obtain a UMLS api key (detailed in configuration file)

LOINC

Configuration

Please carefully review the config.py file. This program assumes that you have collected raw laboratory data and aggregated it, grouping by [Site, Lab Test Name, Specimen Type, Units (missingness allowed), and locally-recorded LOINC code (missingness allowed)]. Within the above grouping, the user is expected to have created summary measures of the numeric test results within the aggregated groups. The summary measures used as features in the algorithm include [Mean, Minimum, Maximum, 5th Percentile, 25th Percentile, Median, 75th Percentile, 95th percentile, and Count] Because laboratory test names may contain commas, it is recommended that the user ensure that the delimiter in their raw data file is unique (i.e. '|') The aggregate source data should be stored in a .txt, .csv, or .xls file

The following fields in the config.py file require the user to enter information:

  • out_dir
  • in_file
  • loinc_file_path
  • lib_loc (for R strindist package)
  • test_col (see config.py file)
  • spec_col (see config.py file)
  • units (see config.py file)
  • loinc_col (see config.py file)
  • min_col (see config.py file)
  • max_col (see config.py file)
  • mean_col (see config.py file)
  • perc_5 (see config.py file)
  • perc_25 (see config.py file)
  • median_col (see config.py file)
  • perc_75 (see config.py file)
  • perc_95 (see config.py file)
  • count (see config.py file)
  • site (see config.py file)
  • missing (see config.py file)
  • api_key

The following fields in the config.py file may optionally be configured:

  • delim
  • write_file_source_data_cleaning
  • write_file_loinc_parsed
  • write_file_umls_cuis
  • rejection_threshold
  • num_cuis
  • min_sites_per_loinc_key
  • min_tests_per_loinc_group
  • min_row_count_per_loinc_group
  • run_cv
  • n_splits
  • tuning_evals
  • max_features
  • max_depth
  • min_samples_split
  • n_estimators

Following Configuration:

Once all dependencies are installed and configurations are completed in the config.py file, the user needs only to execute the Analysis.py file, which fits the machine learning model, makes predictions on both labeled and unlabeled data, and outputs results to the labeled_data_predictions.csv and unlabeled_data_predictions.csv files.

ml_loinc_mapping's People

Contributors

skparr avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.