Claudio, What are your thoughts on how iMOKA performs on a dataset w

Sample Variation Normalization about imoka HOT 2 CLOSED

ritchielabigh commented on June 16, 2024

Sample Variation Normalization

from imoka.

Comments (2)

RitchieLabIGH commented on June 16, 2024

Hi Jake,
iMOKA normalizes the data dividing each count by the total sum of k-mer counts and not by the number of k-mers:

Where n_i is the normalized count of the k-mer i, c_i and c_j are the raw counts of a k-mer i and j.
This allows normalizing in one go for the read length and count.
We have tested it with datasets having samples sequenced three times deeper than the ones with the lowest depth and it works fine: the signal we see was not biased by the sequencing depth and the abundances of k-mers representative of a given gene across the samples is similar to the one computed using Salmon.
Nevertheless, if the groups are unbalanced, for example one group has the majority of the samples with high depth sequences, you might have false signals of lowly expressed k-mers because they might not have been detected by the low depth samples due to the reduced resolution.
In the previous formula, we can replace the detected count ( c ) by the abundance of the k-mer (a ) times the resolution of the current sample (r):

As you can see, the normalized count is independent of the resolution.
For a k-mer with an abundance of 0.5, you need at least an r of 2 in order to have at least one k-mer in your sample. Furthermore, in the preprocessing step of iMOKA there is a min_count filter ( 5 is the default ) that ignores the k-mers with a count lower than 5. In this case, you need an r of 10 to consider the k-mer having an abundance of 0.5.

Cheers,
Claudio

from imoka.

jakewendt commented on June 16, 2024

Understood. Thank you.

from imoka.

Recommend Projects

Sample Variation Normalization about imoka HOT 2 CLOSED

Comments (2)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent