Giter Club home page Giter Club logo

rufus's People

rufus's Issues

Investigate variants that RUFUS didn't call from gold-standard dataset - specifically those that had contigs made but no variant calls

Steps to reproduce variants not called by RUFUS that are in the gold-standard data set:
(Notes from S. Gardiner)

Files to reproduce:

So, I've been looking at the RUFUS run from the merged bams of EA/NC/LL with this file path:
/scratch/ucgd/lustre-work/marth/u0880188/smaht/hcc1395_seqc2/merged_runs/EA_NC_LL

How to identify variants that do have contigs made, but don't have calls in vcf:

But, the way I did it previously was I ran bedtools coverage on this file that contains contigs:
/scratch/ucgd/lustre-work/marth/u0880188/smaht/hcc1395_seqc2/merged_runs/EA_NC_LL/EA_NC_LL_3_merged_tumor.bam.generator.V2.overlap.hashcount.fastq.bam at the specific sites from the validated vcf that RUFUS failed to call.
This gave me this file: /scratch/ucgd/lustre-work/marth/u0880188/smaht/hcc1395_seqc2/merged_runs/EA_NC_LL/contig_depth.txt
Where i then just used a python script to pull out locations that had at least a coverage of 1.
Here is a tsv file of that:
/scratch/ucgd/lustre-work/marth/u0880188/smaht/hcc1395_seqc2/merged_runs/EA_NC_LL/contig_variants_validated.tsv

Looks like there were 1209 variants

Select data set to run rufus warm-up on

Evaluate existing performance of RUFUS:

Option A

  1. Find out sample IDs for PCAWG data used
  2. Get bams and vcfs corresponding to those samples
  3. Set up experimental design of N-T pairs
  4. Run RUFUS on pairs
  5. Accumulate calls and quantify sensitivity & non-overlapping calls (true positives?)
  6. Generate figures to compare first round of PCAWG calls vs. my second round

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.