irzamsarfraz / snakemake-workshop Goto Github PK

View Code? Open in Web Editor NEW

This project forked from brite-reu/snakemake-workshop

0.0 0.0 0.0 31 KB

Worklflow managers are good mkay

Shell 4.08% Python 82.02% R 13.90%

snakemake-workshop's Introduction

👋 Hi, I’m @irzamsarfraz, a PhD Bioinformatics student in Josh Campbell's Lab at Boston University School of Medicine.
👀 I’m interested in developing computational tools and approaches for analysis of genomic data.
📫 Reach me at [email protected].

snakemake-workshop's People

Contributors

snakemake-workshop's Issues

4. Implement plot_clusters rule

Hint: This uses the plot_cells.R script

This rule should do the following to make plots of the clusters found in the single-cell data:

Take as input the three output files from the cluster_cells rule
Specify which attribute to color the data by (Hint: an attribute is usually a piece of metadata or categorical label for a data point. In this example, the attribute is the column titled "louvain")
Output a '.png' plot of clustered data

1. Implement download_data rule

Hint: This rule uses the script download_data.py

This rule should do the following:

Download the dataset 'pbmc3k'
Output a '.h5ad' file. You should output this file to some type of data directory

5. Create a config.yaml file to specify the parameters

Up until now, you have hardcoded the parameters needed into each rule. A more elegant solution is to have the Snakefile use a config.yaml file that specifies these parameters. In this step you should do the following:

Create a config.yaml file
Specify the parameters in the config.yaml file
Import the config.yaml file into the Snakefile
Have the Snakefile use the parameters specified in the config.yaml file

6. Generate a snakemake report

Now that the pipeline is working, add functionality to the snakefile to generate an html report. To do this you will need to:

Look at the snakemake documentation for the report feature
Add a report flag to output(s) you want to display in the report (e.g. a plot)
Specify categories for the output(s) in the report (i.e. sections)
Run snakemake with the report flag

Hint: snakemake documentatiion

3. Implement the cluster_cells rule

Hint: This uses the cluster_cells.py step

This rule should do the following to cluster the data and output it in an R-readable format:

Take the output of the preprocess_data rule as input
Specify parameters such that the number of clusters is 15 and the resolution is 1
Create three output files for the count matrix, cell metadata, and gene metadata. (Tip: think about directory structure when choosing where this clustered data should save to)

2. Implement the preprocess_data rule

Hint: This rule uses the preprocess.py step

This rule should do the following:

Take the output of the download_data rule as input
Specify parameters to filter the data such that the minimum number of cells is 3, the minimum number of genes is 200, the maximum percent of mitochondrial reads is 5%, and the number of highly variable genes to retain is 2000
Output a processed '.h5ad' file. You should put this in some type of data directory. (Hint: can you specify different directories for raw and processed data?)

7. Modify the Snakefile to generalize rules for multiple data sets

Right now, your Snakefile will process one data set that is hard coded in. A more realistic use of a snakemake pipeline is to apply it to multiple data sets.

Here, use wildcards to generalize the rules for multiple data sets.

Hint: snakemake wildcards doc

irzamsarfraz / snakemake-workshop Goto Github PK

snakemake-workshop's Introduction

snakemake-workshop's People

Contributors

snakemake-workshop's Issues

4. Implement plot_clusters rule

1. Implement download_data rule

5. Create a config.yaml file to specify the parameters

6. Generate a snakemake report

3. Implement the cluster_cells rule

2. Implement the preprocess_data rule

7. Modify the Snakefile to generalize rules for multiple data sets

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent