Giter Club home page Giter Club logo

access_data_analysis's People

Contributors

carmelinacharalambous avatar karthigayini avatar murphycj2 avatar peteryzheng avatar rhshah avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

access_data_analysis's Issues

This has to be unique.

sample.type = sample.sheet[Sample_Barcode == sample.name]$Sample_Type

To avoid mutate function erroring out:

❯ Rscript ./filter_calls.R -m $PWD/access_data_analysis_inputs.tsv -o $PWD/result_27Jan2022
---------------
Arguments input:
/home/shahr2/bergerlab/Project_12672_B/small_variants/access_data_analysis_inputs.tsv
/home/shahr2/bergerlab/Project_12672_B/small_variants/result_27Jan2022
/juno/work/access/production/resources/dmp_signedout_CH/current/signedout_CH.txt
stringent
---------------
[1] "Processing patient C-E3C1KC"
[1] "list"
Error: Column `Tumor_Sample_Barcode` must be length 36 (the number of rows) or one, not 2
$`suppressWarnings(filter_calls(fread(master.ref), results.dir, chlist, crite`
<environment: 0x5560b0cd5958>

$`withCallingHandlers(expr, warning = function(w) if (inherits(w, classes)) t`
<environment: 0x5560b0cdb398>

$`filter_calls(fread(master.ref), results.dir, chlist, criteria)`
<environment: 0x5560b0cd9f08>

$`lapply(unique(master.ref$cmo_patient_id), function(x) {\n    print(paste0("P`
<environment: 0x5560b0cdd360>

$`FUN(X[[i]], ...)`
<environment: 0x5560b0d22688>

$`do.call(rbind, lapply(fillouts.filenames, function(y) {\n    sample.name = g`
<environment: 0x5560b0d22148>

$`eval(lhs, parent, parent)`
<environment: 0x5560b0d2c308>

$`eval(lhs, parent, parent)`
<environment: 0x5560b0d22688>

$`do.call(rbind, lapply(fillouts.filenames, function(y) {\n    sample.name = g`
<environment: 0x5560b0d2fd90>

$`lapply(fillouts.filenames, function(y) {\n    sample.name = gsub(".*./|-ORG.`
<environment: 0x5560b0d33c08>

$`FUN(X[[i]], ...)`
<environment: 0x5560b0d3b690>

$`maf.file %>% mutate(Tumor_Sample_Barcode = paste0(sample.name, "___", sampl`
<environment: 0x5560b0d3b348>

$`withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))`
<environment: 0x5560b0d3d150>

$`eval(quote(`_fseq`(`_lhs`)), env, env)`
<environment: 0x5560b0d3cd98>

$`eval(quote(`_fseq`(`_lhs`)), env, env)`
<environment: 0x5560b0d3ad28>

$``_fseq`(`_lhs`)`
<environment: 0x5560b0d40420>

$`freduce(value, `_function_list`)`
<environment: 0x5560b0d40298>

$`function_list[[i]](value)`
<environment: 0x5560b0d3fce8>

$`mutate(., Tumor_Sample_Barcode = paste0(sample.name, "___", sample.type))`
<environment: 0x5560b0d3f9d8>

$`mutate.data.frame(., Tumor_Sample_Barcode = paste0(sample.name, "___", samp`
<environment: 0x5560b0d3f3b8>

$`as.data.frame(mutate(tbl_df(.data), ...))`
<environment: 0x5560b0d3ef20>

$`mutate(tbl_df(.data), ...)`
<environment: 0x5560b0d3eaf8>

$`mutate.tbl_df(tbl_df(.data), ...)`
<environment: 0x5560b0d42490>

$`mutate_impl(.data, dots, caller_env())`
<environment: 0x5560b0d41c40>

$`stop(list("Column `Tumor_Sample_Barcode` must be length 36 (the number of r`
<environment: 0x5560b0d413f0>

attr(,"error.message")
[1] "Error: Column `Tumor_Sample_Barcode` must be length 36 (the number of rows) or one, not 2\n"
attr(,"class")

Solution:

sample.type = unique(sample.sheet[Sample_Barcode == sample.name]$Sample_Type)

DMP ID

For compile_reads.R:

  • If DMP id is not there what should be there in Master REF for that column
  • If DMP id is present but not present in 12-245 key file it should exit gracefully with proper error

discrepancy between README on GitHub and --help. --id is --sid on develop branch

(base) python get_cbioportal_variants.py --help
Usage: get_cbioportal_variants.py [OPTIONS]

  Tool to do the following operations: A. Get subset of variants based on
  Tumor_Sample_Barcode in MAF file B. Mark the variants as overlapping with
  BED file as covered [yes/no], by appending "covered" column to the subset
  MAF

  Requirement: pandas; typing; typer; bed_lookup(https://github.com/msk-
  access/python_bed_lookup)

Options:
  -m, --maf FILE        MAF file generated by cBioportal repo  [default: /work
                        /access/production/resources/cbioportal/current/msk_so
                        lid_heme/data_mutations_extended.txt]

  -i, --ids PATH        List of ids to search for in the
                        'Tumor_Sample_Barcode' column. Header of this file is
                        'sample_id'  [default: ]

  --sid TEXT            Identifiers to search for in the
                        'Tumor_Sample_Barcode' column. Can be given multiple
                        times  [default: ]

  -b, --bed FILE        BED file to find overlapping variants  [default:
                        /work/access/production/resources/msk-
                        access/current/regions_of_interest/current/MSK-
                        ACCESS-v1_0-probe-A.sorted.bed]

  -n, --name TEXT       Name of the output file  [default: output.maf]
  --install-completion  Install completion for the current shell.
  --show-completion     Show completion for the current shell, to copy it or
                        customize the installation.

  --help                Show this message and exit.

Incorporation of Hotspot list and CCF information

Problem: There are multiple mutations to view in plot_events.

Solutions:

  • It would be better to only color the hotspot ones rather than all.
  • Using CCF to determine clonal vs subclonal mutations and coloring them based on that.

compile_reads error: "can't set ALTREP truelength"

I get the following error message when I try to run compile_reads.R

Error in .shallow(x, cols = cols, retain.key = TRUE) :
can't set ALTREP truelength

My command:
Rscript R/compile_reads.R -m /juno/work/bergerm1/bergerlab/access_projects/Project_06302_TDM1/metadata/for_access_data_analysis.2020-08-20.csv -o /juno/work/bergerm1/bergerlab/access_projects/Project_06302_TDM1/analysis_workflow_results

create_report script template.Rmd

Lines 239 and 241

final[is.na(final$HGVSp_Short) & nchar(final$Reference_Allele)>5,"VarName"] <- paste0(final$Hugo_Symbol, " ", final$Chromosome, ":", final$Start_Position, " ", substr(final$Reference_Allele,1,3),"..", ">", final$Tumor_Seq_Allele2)[is.na(final$HGVSp_Short) & nchar(final$Reference_Allele)>5 ]

final[is.na(final$HGVSp_Short) & nchar(final$Tumor_Seq_Allele2)>5,"VarName"] <- paste0(final$Hugo_Symbol, " ", final$Chromosome, ":", final$Start_Position, " ", final$Reference_Allele,1,3, ">", substr(final$Tumor_Seq_Allele2,1,3),"..")[is.na(final$HGVSp_Short) & nchar(final$Tumor_Seq_Allele2)>5]

Add mutation called status for each IMPACT sample

This would be useful for patients with multiple IMPACT samples. E.g. if a mutation was called in one IMPACT and genotyped in the other, we currently cannot easily tell that from the excel files.

Installation README

Here is what I did once you have conda installed using this guide

conda create --name access_data_analysis python=3
conda activate access_data_analysis
conda install r-essentials r-base
conda install r-argparse
pip install genotype-variants

have report show multiple IMPACT samples

Some patients have multiple impact samples. one way to do this is to have a separate tab for each IMPACT samples, since it is not clear how to correct the VAFs when there are multiple IMPACT samples

compile reads issues

Reported by @kanika-arora . Tried with master branch. To reproduce:

Rscript ~/tools/access_data_analysis/R/compile_reads.R \
  -m /juno/work/bergerm1/bergerlab/access_projects/Project_06302_TDM1/metadata/C-F38KR6_for-access-data-analysis.csv \
  -o /home/murphyc4/test/ \
  -pid Project_06302_TDM1

The error message.

Error in rbindlist(l, use.names, fill, idcol) : 
 Column 150 ['C-F38KR6-L002-DUPLEX'] of item 2 is missing in item 1. Use fill=TRUE to fill with NA (NULL for list columns), or use.names=FALSE to ignore column names.

Donor bams setup

I just realized I am genotyping duplex bams in /ifs/work/bergerm1/ACCESS-Projects/novaseq_curated_duplex_v2/ as standard bams...

If we are going to genotype donor bams as actually plasma samples, we need both duplex and simplex bams?

filter_reads.R

This may be due to R version inconsistencies (warning: package 'dplyr' was build under R version 3.6.3): filter_calls.sh stops with error message Error in .shallow(x, cols = cols, retain.key = TRUE) : can't set ALTREP truelength at lines 99-102.

https://github.com/msk-access/access_data_analysis/blob/master/R/filter_calls.R#L99

I got it working by commenting them out and doing

maf.file <- data.frame(maf.file)
maf.file$Tumor_Sample_Barcode <- paste0(sample.name, '___',sample.type)
maf.file <- cbind(maf.file,data.frame(t_alt_count= maf.file$t_alt_count_standard))
maf.file <- cbind(maf.file,data.frame(t_total_count= maf.file$t_total_count_standard))
maf.file <- data.table(maf.file)

Not sure why similar things work at other places but not here.

collection_date column

Need to make sure the plot_all_event script accommodate both dates and character types

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.