Comments (9)
Hi, Just checking in about the above question too. The key issue is every time we have a new rare disease family, can we avoid having to rerun the entire analysis for all the many control samples and start the analysis from some intermediate step. This is analogous to the n + 1 problem in joint genotyping analysis. Thanks.
from drop.
Hi, for the time being, you have to keep all the BAM files. The BAM files are the main input of DROP. Snakemake (and the way we designed DROP) checks that the BAM files of the samples that are going to be processed exist, and then begins with the analysis.
Nevertheless, if you add a new sample, only this one will be counted (both for gene-level and split reads) and then merged with the rest. The other ones will not be re-counted, but the BAM files must exist.
from drop.
I see. How do I configure the pipeline so it merges the new samples with the prior samples? Does it have to be in the same master directory of the original analysis, with a new config.yaml file?
Regardless, it would be good to have an option to skip the counting step so that the original BAM files don't have to be kept for control samples.
from drop.
You have to add the new samples as rows in the sample annotation and assign them to the corresponding DROP GROUP that you want to merge them with. Then, Snakemake will recognize that there are new processes to be done.
Yes, we're considering that option that's also useful when merging with external counts.
from drop.
If I add a new sample to the sample annotation, does it have to be in the same original drop analysis folder? I'm guessing yes, but just want to make sure.
from drop.
What exactly has to be in the same original analysis folder?
Every time a new analysis in run, everything's is rewritten on the processed_data
and processed_results
folders. A new copy of the OUTRIDER data set (ods) object is saved, but not for the FRASER data set (fds) object, because it's too big.
from drop.
To clarify, if I start with one analysis with 10 of my samples + 100 control samples.
Then I want to do another analysis with 5 new samples together with the previous 10 samples from our lab + 100 control samples.
How do I set this up exactly? Do I just change the sample annotation table in the same DROP project folder of the first analysis? Because above you wrote that Snakemake can do this without having to recalculate all the processing for the samples from the first analysis of 10 + 100 samples. But in order for that to work, that means that all the analysis files must still exist from the first analysis, which I am guessing means that the second analysis must occur in the same folder as the first analysis. Is that correct?
from drop.
Yes, you change the sample annotation in the same DROP project folder and then execute snakemake ...
.
Because it's in the same folder, it will recognize the samples that are already processed and the ones that need to be processed.
from drop.
Ok thanks.
from drop.
Related Issues (20)
- Running pipeline offline in trusted research environemnt HOT 1
- lymphoblastoid cell lines datasets of gene counts
- Error running aberrantSplicing HOT 2
- Error in AberrantSplicing_pipeline_FRASER_04_fit_hyperparameters_FraseR_R HOT 3
- Error in MAE QC create matrix dna rna cor HOT 3
- CalledProcessError in installRPackages.R HOT 5
- Default running folder is out of space HOT 2
- Chunk options `#+echo` not correctly parsed HOT 1
- Incompatible with `Snakemake>=8` HOT 2
- Writing `rds` files as log can crash Snakemake execution HOT 1
- Error in h(simpleError(msg, call)) HOT 8
- requirementsR.txt referencing HEAD leads to irreproducibility / pipeline breaking HOT 3
- Problem running DROP HOT 2
- Annotation file asks for columns that shouldn't be needed HOT 2
- Error in rule AberrantSplicing_pipeline_Counting_01_1_countRNA_splitReads_samplewise_R HOT 10
- Pipeline fails with no significant results (AberrantSplicing_pipeline_FRASER_08_extract_results_FraseR_R) HOT 1
- Error in rule AberrantSplicing_pipeline_Counting_01_1_countRNA_splitReads_samplewise_R HOT 2
- Pipeline FAILS when specifying subsets of genes to test HOT 1
- useNames = NA is defunct HOT 4
- conda setup using yaml doesn't work HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from drop.