Giter Club home page Giter Club logo

snakemake-workshop's Introduction

  • ๐Ÿ‘‹ Hi, Iโ€™m @irzamsarfraz, a PhD Bioinformatics student in Josh Campbell's Lab at Boston University School of Medicine.
  • ๐Ÿ‘€ Iโ€™m interested in developing computational tools and approaches for analysis of genomic data.
  • ๐Ÿ“ซ Reach me at [email protected].

snakemake-workshop's People

Contributors

dakota-hawkins avatar ebriars avatar

snakemake-workshop's Issues

4. Implement plot_clusters rule

Hint: This uses the plot_cells.R script

This rule should do the following to make plots of the clusters found in the single-cell data:

  • Take as input the three output files from the cluster_cells rule
  • Specify which attribute to color the data by (Hint: an attribute is usually a piece of metadata or categorical label for a data point. In this example, the attribute is the column titled "louvain")
  • Output a '.png' plot of clustered data

1. Implement download_data rule

Hint: This rule uses the script download_data.py

This rule should do the following:

  • Download the dataset 'pbmc3k'
  • Output a '.h5ad' file. You should output this file to some type of data directory

5. Create a config.yaml file to specify the parameters

Up until now, you have hardcoded the parameters needed into each rule. A more elegant solution is to have the Snakefile use a config.yaml file that specifies these parameters. In this step you should do the following:

  • Create a config.yaml file
  • Specify the parameters in the config.yaml file
  • Import the config.yaml file into the Snakefile
  • Have the Snakefile use the parameters specified in the config.yaml file

6. Generate a snakemake report

Now that the pipeline is working, add functionality to the snakefile to generate an html report. To do this you will need to:

  • Look at the snakemake documentation for the report feature
  • Add a report flag to output(s) you want to display in the report (e.g. a plot)
  • Specify categories for the output(s) in the report (i.e. sections)
  • Run snakemake with the report flag

Hint: snakemake documentatiion

3. Implement the cluster_cells rule

Hint: This uses the cluster_cells.py step

This rule should do the following to cluster the data and output it in an R-readable format:

  • Take the output of the preprocess_data rule as input
  • Specify parameters such that the number of clusters is 15 and the resolution is 1
  • Create three output files for the count matrix, cell metadata, and gene metadata. (Tip: think about directory structure when choosing where this clustered data should save to)

2. Implement the preprocess_data rule

Hint: This rule uses the preprocess.py step

This rule should do the following:

  • Take the output of the download_data rule as input
  • Specify parameters to filter the data such that the minimum number of cells is 3, the minimum number of genes is 200, the maximum percent of mitochondrial reads is 5%, and the number of highly variable genes to retain is 2000
  • Output a processed '.h5ad' file. You should put this in some type of data directory. (Hint: can you specify different directories for raw and processed data?)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.