Giter Club home page Giter Club logo

bedops_parse_star_junctions's Introduction

bedops_parse_star_junctions

Pipeline for taking STAR's SJ.out files and parsing the counts for a given bed of named spliced junctions

Edit the top lines of the .smk file to run correctly

There's two independent parts to the analysis, one will turn STAR's splice junction into bed files, sort them, and then use bedtools plus some awk to give a final output file that looks like this(without the header)

chromosome start end filename_this_count_comes_from count strand name_of_junction_in_your_input
chr19 7168094 7170537 Cont-B_S2.SJ.out 49 - INSR_annotated
chr19 7168094 7170537 Cont-C_S3.SJ.out 30 - INSR_annotated
chr19 7168094 7170537 Cont-D_S4.SJ.out 35 - INSR_annotated
chr19 7168094 7170537 control_fluorescent_2.SJ.out 9 - INSR_annotated
chr19 7168094 7170537 control_fluorescent_3.SJ.out 5 - INSR_annotated
chr19 7168094 7170537 control_none_1.SJ.out 20 - INSR_annotated

To use that part properly you'll want to edit parse_star_junctions.smk And tweak the following input

project_dir - this is a top level folder where the sorted beds, and outputs are going to end up

out_spot - a folder underneath project_dir that will be created, and sorted beds are output is going to appear

bam_spot - pipeline is fairly lazy, it's going to glob wild cards from this folder, so make sure all the samples you want to are in the same folder, (symlinks are fine!)

bam_suffix - suffix of the bams for pattern matching to work

sj_suffix - suffix of your splice junction tables for pattern matching to work

bed_file - a bed file of junctions you want to compare against

final_output_name - a name for your file. the final output file will be located in:

{project_dir}/{out_spot}/{final_output_name}.aggregated.clean.annotated.bed

This will contain only the junctions in bed_file and with the names of the junction and the names of the file it was found it

You'll also have a file called {project_dir}/{out_spot}/{final_output_name}.aggregated.bed

This is all the junctions which overlapped the ones in bed_file, useful to check if you expected junctions that weren't present because you might have an a one-off error.

Basic work flow is that the first rule will convert a SJ.out.tab to a bed file, and put the 'name' of each entry as the name of the file it's in

e.g. if I input a folder with samples called sample01.SJ.out.tab and sample02.SJ.out.tab

I'll get 2 beds that look like this

chrY 57208979 57209532 sample01.SJ.out 0 + chrY 57209059 57209219 sample01.SJ.out 0 +

chrY 57208979 57209532 sample02.SJ.out 0 + chrY 57209059 57209219 sample02.SJ.out 0 +

The second part uses Dasper to annotate relative to a GTF and convert STAR's splice junction counts to percent spliced in.

Feel free to try linking the 2 together but that part is highly developmental yet, so buyer-beware.

bedops_parse_star_junctions's People

Contributors

aleighbrown avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.