Giter Club home page Giter Club logo

impc-data-analysis's Introduction

IMPC Data Analysis Code

The purpose of this code is to download KOMP metadata relating to a variety of procedures, phenotypes and metadata across a variety of phenotyping centers from the IMPC pipeline

This link can be used to find a list of the different procedures and the phenotypes/metadata that is recorded for them. The procedures all have a three character procedure code that is used in the code to determine what type of data you wish to download.

Prerequisites

Python 3, Gawk, R, and a unix emulator if you are on pc (I use cygwin)

Packages

Python

Python packages: csv, pandas, numpy, sys, requests, bs4, os, subprocess, time, datetime, re, glob, lxml, and pprint

R

R packages: ggplot2, stringr, dplyr, reshape2, lubridate, gdata, readr, tidyr, Hmisc, plyr, zoo, and signal

Step By Step Instructions For Running The Code

Make sure you run the different scripts as organized here from top to bottom

First Python Script

  1. Edit the directory variable in IMPC_A.py to the directory you want the data to be downloaded to
  2. At the bottom of IMPC_A.py change the procedure code in the function DownloadAll to the 3 digit code for the procedure data you would like to download. The codes are in the format AAA.
  3. Run the IMPC_A.py script

Unix Code

  1. Set your working directory to the location of the raw data that the first script downloaded. This is located within a folder that is titled your procedure code (such as OFD), which is within your originl directory folder you directed the data to be downloaded to.
  2. Run the following line of unix code, where AAA would represent whatever procedure code you are using
awk 'NR%2-1{gsub(/,/,";")}1' RS=\" ORS=\" AAA_data_2.csv > AAA_data.csv

Second Python Script

  1. Edit the directory variable in IMPC_B.py to the directory you want the data to be downloaded to
  2. At the bottom of IMPC_B.py change the procedure code in the function create_wide_data to the 3 digit code for the procedure data you would like to download. The codes are in the format AAA.
  3. Run the IMPC_B.py script

R All Plots Script

  1. Change the directory variable to the directory you used in the two python scripts
  2. Change the procedure_code variable to the procedure code you used in the two python scripts
  3. Run the R script and it should begin generating plots

R Specific Plot Script

There are multiple R files where each one is for a single plot type that is in the AllPlots.r script, if you only want to generate a specific type of plot, follow the same instructions as the R All Plots Script, except for the plot script you want to use.

impc-data-analysis's People

Contributors

dagenaisbranden avatar

Watchers

James Cloos avatar Vivek avatar Brian avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.