Giter Club home page Giter Club logo

Comments (6)

CloXD avatar CloXD commented on June 20, 2024

Hello Jake,
I think it's a problem of memory due to the big number of results ( I have never tried iMOKA with WGS, but I imagined there would have been lots of results ).
Try increasing the general threshold (-T) to 90 and the source threshold (-t) to 95 ( or even 95 and 99 ) to keep only the best results.
With larger cohorts, the accuracy values should be more reliable: if in the reduction step you kept the default values, you used 1/4
of the samples as test, that means 1 for each group. Take a look at the reduced matrix and if there are only 100, it would be better to increase the number of samples in each group to 10 or increase the fraction of the test set ( -t ) to 0.4 ( so with 5 samples, it will use 2 as test and 3 as training ).
I hope this will help.
Cheers,
Claudio

from imoka.

jakewendt avatar jakewendt commented on June 20, 2024

Thanks again Claudio.

Initially, this was just a test of principle, so the accuracy of the results weren't really that important. Once functioning, I am planning to run all available samples.

Not sure where to check for 100 as you suggested.

The reduced matrix did keep half a billion kmers which is quite a bit.

head 15/reduced.matrix
#{"adjustments":[0.25,0.05],"cross_validation":100,"file_in":"/francislab/data1/working/20200603-TCGA-GBMLGG-WGS/20210923-iMOKA-tumor-normal-test/15/matrix.json","file_out":"/francislab/data1/working/20200603-TCGA-GBMLGG-WGS/20210923-iMOKA-tumor-normal-test/15/reduced.matrix","kept":538537323,"min_acc":65.0,"minimum_count":5,"perc_test":0.25,"processed":864984338,"standard_error":0.5}
kmer	nMutant_x_nWT	nMutant_x_tMutant	nMutant_x_tWT	nWT_x_tMutant	nWT_x_tWT	tMutant_x_tWT	nMutant	nWT	tMutant	tWT
AAAAAAAAAAAAAAA	79.500	81.500	93.500	66.000	62.000	22.000	462462.289	504177.301	527753.348	534412.340
AAAAAAAAAAAAAAC	77.000	49.000	83.500	68.000	35.000	58.500	18289.031	23344.986	20507.831	22575.272

I also just noticed a new quirk with WGS. At least paired data anyway. The kmer counts aren't canonical as the reduced.matrix includes reverse complements. I'm guessing that they probably should given that half the reads are forward and have are reverse complement. That would mean going back to the preprocessing step, I think, and changing the library type. I'm assuming that the default to library type is effectively ff. I'm gonna try fr. Suggestions there?

I'll make the mods to the aggregate that you suggested and rerun.

Thanks again,
Jake

from imoka.

CloXD avatar CloXD commented on June 20, 2024

No problem.
The reduced matrix has accuracies different than only 100, so that's fine ( from the second column to the seventh ).
The k-mers are not canonical on purpose to handle stranded RNA-seq.
An optimization of iMOKA for WGS would include the use of canonical k-mer, the adaptation of the aggregation step for canonical ( all the steps that consider the k-mer sequence, such as the generation of the graphs, the mapping etc.. ) and eventually a discretization of the k-mer counts.
Those changes require lots of work (and a dataset of test), but unfortunately, my contract just ended and I don't know yet if I'll continue to develop iMOKA in the future or if someone else will.
Cheers,
Claudio

from imoka.

jakewendt avatar jakewendt commented on June 20, 2024

Will passing --library-type fr to preprocess correctly orient the extracted kmers when used in paired sequences when passed in the source files as ...?

sample	group	FILE_R1.fastq.gz;FILE_R2.fastq.gz

from imoka.

CloXD avatar CloXD commented on June 20, 2024

yes, It will convert the file matching the RE /[]?[R]2[.]/ and convert it to its reverse complementary ( the file 1 is associated with []?[R_]1[._] ).

from imoka.

jakewendt avatar jakewendt commented on June 20, 2024

Just to close this off, I reran from preprocessing with --library-type fr, reduce with --test-percentage 0.5 and aggregate with --global-threshold 95 --origin-threshold 99 and the problem went away. The change in aggregate parameters is likely what stopped the seg fault.

Thanks again Claudio

from imoka.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.