Comments (6)
Hello Jake,
I think it's a problem of memory due to the big number of results ( I have never tried iMOKA with WGS, but I imagined there would have been lots of results ).
Try increasing the general threshold (-T) to 90 and the source threshold (-t) to 95 ( or even 95 and 99 ) to keep only the best results.
With larger cohorts, the accuracy values should be more reliable: if in the reduction step you kept the default values, you used 1/4
of the samples as test, that means 1 for each group. Take a look at the reduced matrix and if there are only 100, it would be better to increase the number of samples in each group to 10 or increase the fraction of the test set ( -t ) to 0.4 ( so with 5 samples, it will use 2 as test and 3 as training ).
I hope this will help.
Cheers,
Claudio
from imoka.
Thanks again Claudio.
Initially, this was just a test of principle, so the accuracy of the results weren't really that important. Once functioning, I am planning to run all available samples.
Not sure where to check for 100
as you suggested.
The reduced matrix did keep half a billion kmers which is quite a bit.
head 15/reduced.matrix
#{"adjustments":[0.25,0.05],"cross_validation":100,"file_in":"/francislab/data1/working/20200603-TCGA-GBMLGG-WGS/20210923-iMOKA-tumor-normal-test/15/matrix.json","file_out":"/francislab/data1/working/20200603-TCGA-GBMLGG-WGS/20210923-iMOKA-tumor-normal-test/15/reduced.matrix","kept":538537323,"min_acc":65.0,"minimum_count":5,"perc_test":0.25,"processed":864984338,"standard_error":0.5}
kmer nMutant_x_nWT nMutant_x_tMutant nMutant_x_tWT nWT_x_tMutant nWT_x_tWT tMutant_x_tWT nMutant nWT tMutant tWT
AAAAAAAAAAAAAAA 79.500 81.500 93.500 66.000 62.000 22.000 462462.289 504177.301 527753.348 534412.340
AAAAAAAAAAAAAAC 77.000 49.000 83.500 68.000 35.000 58.500 18289.031 23344.986 20507.831 22575.272
I also just noticed a new quirk with WGS. At least paired data anyway. The kmer counts aren't canonical
as the reduced.matrix
includes reverse complements. I'm guessing that they probably should given that half the reads are forward and have are reverse complement. That would mean going back to the preprocessing step, I think, and changing the library type. I'm assuming that the default to library type
is effectively ff
. I'm gonna try fr
. Suggestions there?
I'll make the mods to the aggregate that you suggested and rerun.
Thanks again,
Jake
from imoka.
No problem.
The reduced matrix has accuracies different than only 100, so that's fine ( from the second column to the seventh ).
The k-mers are not canonical on purpose to handle stranded RNA-seq.
An optimization of iMOKA for WGS would include the use of canonical k-mer, the adaptation of the aggregation step for canonical ( all the steps that consider the k-mer sequence, such as the generation of the graphs, the mapping etc.. ) and eventually a discretization of the k-mer counts.
Those changes require lots of work (and a dataset of test), but unfortunately, my contract just ended and I don't know yet if I'll continue to develop iMOKA in the future or if someone else will.
Cheers,
Claudio
from imoka.
Will passing --library-type fr
to preprocess correctly orient the extracted kmers when used in paired sequences when passed in the source files as ...?
sample group FILE_R1.fastq.gz;FILE_R2.fastq.gz
from imoka.
yes, It will convert the file matching the RE /[]?[R]2[.]/ and convert it to its reverse complementary ( the file 1 is associated with []?[R_]1[._] ).
from imoka.
Just to close this off, I reran from preprocessing with --library-type fr
, reduce with --test-percentage 0.5
and aggregate with --global-threshold 95 --origin-threshold 99
and the problem went away. The change in aggregate parameters is likely what stopped the seg fault.
Thanks again Claudio
from imoka.
Related Issues (20)
- iMOKA GUI HOT 10
- Empty aggregated.sequences.bed.norep.bed causes seg fault HOT 2
- Memory control HOT 3
- Aggregate step crash because "Kept 0 alignments" HOT 2
- "{} Message" when opening K-mer list HOT 7
- Wonky reduce thread HOT 6
- Plotting legend issue with PCA HOT 4
- Sample Variation Normalization HOT 2
- the singularity image HOT 2
- Confidence Interval for ROC AUC metric (Random Forest) HOT 2
- Error when starting singularity exec iMOKA preprocess.sh -i test HOT 13
- --threads does not work? HOT 2
- No error handling when there are no kmers above the thresholds counts at "aggregate" step HOT 1
- How to run it in my Mac? HOT 1
- How to incorporate paired end reads? HOT 2
- File not found error in iMOKA aggregate step HOT 6
- cant get singularity to run HOT 1
- Preprocessing not producing the "sorted.bin" files HOT 2
- aggregate produces many lines of "stdtr domain error" HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from imoka.