Comments (2)
Hi Jake,
iMOKA normalizes the data dividing each count by the total sum of k-mer counts and not by the number of k-mers:
Where n_i is the normalized count of the k-mer i, c_i and c_j are the raw counts of a k-mer i and j.
This allows normalizing in one go for the read length and count.
We have tested it with datasets having samples sequenced three times deeper than the ones with the lowest depth and it works fine: the signal we see was not biased by the sequencing depth and the abundances of k-mers representative of a given gene across the samples is similar to the one computed using Salmon.
Nevertheless, if the groups are unbalanced, for example one group has the majority of the samples with high depth sequences, you might have false signals of lowly expressed k-mers because they might not have been detected by the low depth samples due to the reduced resolution.
In the previous formula, we can replace the detected count ( c ) by the abundance of the k-mer (a ) times the resolution of the current sample (r):
As you can see, the normalized count is independent of the resolution.
For a k-mer with an abundance of 0.5, you need at least an r of 2 in order to have at least one k-mer in your sample. Furthermore, in the preprocessing step of iMOKA there is a min_count filter ( 5 is the default ) that ignores the k-mers with a count lower than 5. In this case, you need an r of 10 to consider the k-mer having an abundance of 0.5.
Cheers,
Claudio
from imoka.
Understood. Thank you.
from imoka.
Related Issues (20)
- iMOKA GUI HOT 10
- Empty aggregated.sequences.bed.norep.bed causes seg fault HOT 2
- Memory control HOT 3
- Aggregate step crash because "Kept 0 alignments" HOT 2
- "{} Message" when opening K-mer list HOT 7
- Wonky reduce thread HOT 6
- Plotting legend issue with PCA HOT 4
- the singularity image HOT 2
- Aggregate step segmentation faults HOT 6
- Confidence Interval for ROC AUC metric (Random Forest) HOT 2
- Error when starting singularity exec iMOKA preprocess.sh -i test HOT 13
- --threads does not work? HOT 2
- No error handling when there are no kmers above the thresholds counts at "aggregate" step HOT 1
- How to run it in my Mac? HOT 1
- How to incorporate paired end reads? HOT 2
- File not found error in iMOKA aggregate step HOT 6
- cant get singularity to run HOT 1
- Preprocessing not producing the "sorted.bin" files HOT 2
- aggregate produces many lines of "stdtr domain error" HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from imoka.