Giter Club home page Giter Club logo

spreadsheets_transformationii's Introduction

Binder

A Wasie KnowEnG's Spreadsheets Transformation pipeline (saved experiment)

This is the Knowledge Engine for Genomics (KnowEnG), an NIH BD2K Center of Excellence, Spreadsheets Transformation Pipeline.

This pipeline applies various transformations to one or more Spreadsheets (Genomic, Phenotypic, ...)

There are eight transformation methods that one can choose from:

Options Method Parameters
Subset Based on Phenotype category and id select subtype spreadsheet, phenotype, id, category
Intersection common samples two spreadsheets
Subset Genes select genes spreadsheet, list
Union merge two spreadsheets
Group Then Apply a Function cluster statistics spreadsheet, labels
Spreadsheet numerical transform numerical transform spreadsheet, transformation name
Spreadsheet statistics stats spreadsheet, statistic name
Spreadsheet transpose run_transpose one spreadsheet
Kaplan-Meier run_kaplan_meier spreadsheet, cluster_id, event, time
Spreadsheet category to binary run_category_binary spreadsheet, category

Table of Transformations


  1. Subset Based on Phenotype category and id
  2. Intersection
  3. Subset Genes
  4. Union
  5. Group then apply a function
  6. Spreadsheet numerical transform
  7. Spreadsheet statistics
  8. Spreadsheet transpose
  9. Kaplan-Meier
  10. Category to Binary

1. Subset Based on a Phenotype category and category name

Subset samples based on some phenotype column value, e.g., patients with longer survival. Output can be a smaller spreadsheet and possibly with fewer columns.

2. Intersection

Finds the intersection between two spreadsheets row names and keep the column names of the two spreadsheets as is. Output is two spreadsheets with only the genes (rows) in common.

3. Subset Genes

Subset the initial spreadsheet's rows based on a given row index names set.

4. Union

Merge two phenotype spreadsheets such that the final spreadsheet contains all columns names and row names.

5. Group Then Apply a Function

Given expression spreadsheet and a group-samples-by criterion, e.g. the mean gene value for each sample-cluster assignment.

6. Spreadsheet numerical transform

Spreadsheet with new numerical values, such as; threshold, log transform, z transform or absolute value.

7. Spreadsheet statistics

Spreadsheet measure overall, by rows or columns such as; min, max, sum, mean, median, standard deviation or variation.

8. Spreadsheet transpose

Spreadsheet rows x columns transposed to columns x rows.

9. Kaplan-Meier

Samples x phenotype spreadsheet, with clusters ID, event and time columns output to Kaplan-Meier plot as png image.

10. Category to Binary

Samples x phenotype spreadsheet, select category column, output samples x unique-categories binary spreadsheet.


How to run this pipeline with Our data.


1. Install the following (Ubuntu or Linux).

apt-get install -y python3-pip
apt-get install -y libfreetype6-dev libxft-dev
apt-get install -y libblas-dev liblapack-dev libatlas-base-dev gfortran

pip3 install numpy==1.11.1
pip3 install pandas==0.18.1 
pip3 install scipy==0.18.0
pip3 install scikit-learn==0.17.1
pip3 install matplotlib==1.4.2
pip3 install pyyaml
pip3 install xmlrunner
pip3 install knpackage

2. Clone the Spreadsheets_Transformation github repository to your computer.

 git clone https://github.com/KnowEnG-Research/Spreadsheets_Transformation.git

3. Change directory to Spreadsheets_Transformation.

cd Spreadsheets_Transformation

4. Change to the test directory.

cd test

5. Run "make" to create a local directory "run_dir" and place all the run files in it

make env_setup

6. Use one of the following "make" commands to select and run a transformation option:

Command Options and input file names
make run_spreadsheet_transpose TEST_1_transpose.yml
make run_spreadsheets_common_samples TEST_2_common_samples.yml
make run_spreadsheets_merge TEST_3_merge.yml
make run_select_spreadsheet_genes TEST_4_select_genes.ym
make run_spreadsheet_clustering_averages TEST_5_cluster_averages.yml
make run_spreadsheet_select_pheno_categorical TEST_6_select_categorical.yml
make run_numerical_tranform TEST_7_numerical_transform.yml
make run_stat_values TEST_8_stat_value.yml
make run_kaplan_meier TEST_9_kaplan_meier.yml
make run_category_binary TEST_10_categorical_to_bin.yml

7. Ouput files will be written to the results directory named in the Options file, using the name(s) of the input files as appended with the transformation name and a timestamp.


How to run this pipeline with your data.


1. Copy and edit the TEST_..._options.yml file most appropriate for your transformation,

  • include the name and location of your input file(s)
  • set additional options as commented in the file
  • set the path to your results directory
  • suggested directory setup is like that created with make env_setup

2. Run the pipeline from the command line with the edited options file in the run directory.

python3 mini_pipelines.py -run_directory your/run_directory/path -run_file your_options.yml

How to run the Spreadsheets_Transformation.ipynb Jupyter notebook.


1. Install Jupyter and the widgets.

http://jupyter.readthedocs.io/en/latest/install.html

http://ipywidgets.readthedocs.io/en/stable/user_install.html

2. Clone this repository using the directions in step 2 above.

git clone https://github.com/KnowEnG-Research/Spreadsheets_Transformation.git

3. Start the jupyter notebook server at the command prompt.

jupyter notebook

4. The Jupyter notebook server should open in your default browser - if not follow the directions in the terminal.

  • In the notbook server window navigate to the directory with the Spreadsheets_Transformation.ipynb notebook and click on it to start it in an new tab.

  • If you don't see a simple page with forms and buttons then you will have to select "Cell" > "Run All" in the Jupyter menu.

  • You may upload your files in the notebook server window or use the default files. The output will be in the "results" directory.

spreadsheets_transformationii's People

Contributors

dlanier avatar nahilsobh avatar noorshalabi avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.