Giter Club home page Giter Club logo

sdtm_mapper's Introduction

sdtm-mapper

Sam Tomioka

Feb 2019

sdtm-mapper is a Python package to generate machine readable CDISC SDTM mapping specifications with help from AI. This can be used for following tasks.

  1. Generates an empty specifications for training data from a user provided SAS dataset. This empty specification will contain SAS dataset attributes. You don't need to use Proc Contents in SAS to do this! SAS datasets maybe in your aws s3 bucket or local folder.
  2. Runs models to generate a mapping specifications.
  3. Generates your own mapping algorithms using your data. The models can be trained to generate the target variables but also programming sudo code.

The first version comes with three pre-trained models (Included in the package). These are trained on feed forward NN with trainable ELMo embedding layer for 34 classes using adverse event datasets from 18 clinical trials, and validation was done on 3 clinical trials until the models were optimized. Test was done on 1 clinical trial. 22 clinical trials data are extracted from Medidata Rave built by 3 different CROs and Sunovion Pharmaceuticals.

Models Parameters Training Acc Validation Acc Test Acc*
1. Elmo+sfnn+ae+Model1.h5 271,142 0.9795 0.9800 0.9540
2. Elmo+fnn+ae+Model2.h5 664,870 0.9846 1.0000 0.9425
3. Elmo+fnn+ae+Model3.h5 594,854 0.9966 1.0000 0.9666

Table 1 - Performance of three models
* Macro accuracy account for system variables for 'drop'.

High variance models may be due to addition of CDASH metadata, and probably better to remove them.

Improvement of the task specific model are explored by Peters et.al [1]:

  1. Freeze context-independent representations from the pre-trained biLM and concatenate them and $ELMo^{task}_{k}$ and pass that into task RNN.
  2. Replacing $h_k$ with $[x_k; ELMo^{task}_{k}]$. Peters et.al [1] has shown improved performance in some tasks such as SNLI and SQuAD by including ELMo at the output of the task RNN.
  3. Add a moderate amount of dropout to ELMo.
  4. Regularize the ELMo weights by adding $\gamma||w||^2_2$ to the loss function.

These can be considered as future enhancment for other domains that may not perform well.

Here is the architecture of ELMo.

Figure 1 - biLM architecture for ELMo

pip install sdtm-mapper
  1. How to prepare training data using sdtm-mapper from SAS7bdat files?
  2. Tutorial on how to use sdtm-mapper to generate mapping specifications
  3. Train your data using SDTMMapper on Model 1: Note that you need to supply your training data.

You have to have an environment to use tensorflow, tensorflow-hub etc.

If you want to contribute for adding more models for different SDTM domains, please join PhUSE ML Project Community. Most of the work has been done during the weekends or evening. Your contributions are always welcome!

Notes about the trained models:

The models were build and trained on raw AE datasets from clincial trials conducted by Sunovion Pharmaceuticals. The EDC system we use is Medidata RaveX. The training data contains some e-source data. The performance may not be good for your data. You can also build your models using SDTMMapper tool and use your custom model for your datasets.

Old reame file is found here

For any questions, comments, suggestions, or issues, please post them here

For personal communication related to SDTMMapper, please contact Sam Tomioka

This is not an official Sunovion Pharmaceuticals product.

1] Peters,M et al. (2018). Deep contextualized word representations

sdtm_mapper's People

Contributors

stomioka avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.