Hey! Thank you a lot for your great work! Your implementation is bla

Thanks <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

Hey <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

Yes, it is solvable. Thanks for explaining the theory. <p dir="a

Thanks <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

Detecting silent bit errors? about reed-solomon HOT 7 CLOSED

ronomon commented on May 23, 2024

Detecting silent bit errors?

from reed-solomon.

Comments (7)

jorangreef commented on May 23, 2024

Thanks @jue89 !

Just an example to check if I understand correctly:

We have 3 data shards (k=3) and 2 parity shards (m=2).
Of these 5 shards (k + m), any m shards (data or parity shards) can be damaged (i.e. have silent bit errors) and we won't know which (we don't have any checksums).
Passing all 5 shards to check() should then tell us which are damaged?

from reed-solomon.

jue89 commented on May 23, 2024

Hey @jorangreef !

Thank you for your response!

Almost. Afaik this should be a (mathematically) solvable scenario:

Passing all 5 shards to check() should then tell us which are damaged?

If 1 shard is damaged, it should tell us which one is damaged.
If 2 shards are damaged, it should tell us that a least 2 shards are damaged but it can't tell us which one.
If more than 2 shards are damaged it is possible that it can't detect the error and tells us that everything is fine.

The theory behind this is called Hamming distance. Reed-Solomon has a Hamming distance of n-k+1, whereas n=m+k. Thus with m=2 the Hamming distance is h=3. According to the Wikipedia article it should be able to detect h-1=2 errors and correct (h-1)/2=1 error.

Are you experienced with information theory? I had a lesson about that in university and could try to understand Reed-Solomon a little bit more and try to come up with a PR. But I can't promise that I am smart enough to do that ;)

from reed-solomon.

jorangreef commented on May 23, 2024

Yes, it is solvable.

Thanks for explaining the theory.

I think it would require encoding data/parity multiple times, for different possible combinations of corrupted shard (i.e. passing different sources/targets to encode()), then checking at each step if this produces the shards on hand.

Perhaps there's a more efficient algorithm than that but otherwise it's going to be computationally expensive (as k and m increase), even if there are a few intermediary caching optimizations, since you're not just encoding for a fixed set of known sources and targets but for combinations of k and m.

So it can be done, but it's costly I think.

I would rather leave that out of the module, since it can be done at a higher abstraction layer.

I would also advise using checksums to detect bit rot. This should be much more efficient?

from reed-solomon.

jorangreef commented on May 23, 2024

Where by checksums I mean SHA256.

from reed-solomon.

jue89 commented on May 23, 2024

Using cryptographic checksums like SHA256 is also costly ;)

I found some additional information about locating errors using a "syndrome": https://en.wikiversity.org/wiki/Reed–Solomon_codes_for_coders

Long story short: It could be a good idea to do the syndrome calculation also in the C++ part. It could benefit from the implementation of the mathematical operations that are already present and optimised for speed. If I understand the example correctly, it shouldn't be that different from calculating the parity shards.

I try to understand the math a little bit better. Would you be interested in merging a PR if I find a solution for this problem?

from reed-solomon.

jorangreef commented on May 23, 2024

Thanks @jue89 I will close this for now then. If you do come up with a solution please let me know.

from reed-solomon.

jorangreef commented on May 23, 2024

By the way, the Reed-Solomon codes for coders article is a great write-up and introduction.

Just on erasures (where corrupt shards are known) vs errors (where corrupt shards are unknown):

for every erasure, you just need one RS code character to repair it, while for every error you need two RS code characters (because you need to find the position in addition of the value/magnitude to correct). Such a decoder is called an errors-and-erasures decoder, or an errata decoder.

I am more interested in RS for erasure coding. I would still advise you to consider a checksum such as SHA256 since it will allow you to make more use of parity, giving you more storage efficiency, and more storage reliability, ceteris paribus.

Using cryptographic checksums like SHA256 is also costly ;)

Sure, everything has a cost. The cost of SHA256 is a single pass over the data (and you can use https://github.com/ronomon/crypto-async to do this in the threadpool). Using RS for error detection may be more expensive computationally, since it's mostly brute-force (with one or two optimizations) and since you need to do it every time a shard is corrupted (for the lifetime of the data). SHA256 checksums only need to be created once, the very first time you encode your shards.

SHA256 checksums also have the benefit that you can quickly detect bit-rot for a single shard, without having to read any other shards from the disk or network, saving you seeks and bandwidth.

For example, trying to use RS for error detection, you would need to read O(K+M) shards to detect errors in a single shard vs O(1) for SHA256. I don't know of storage implementations that would prefer RS for error detection over SHA256.

from reed-solomon.

Detecting silent bit errors? about reed-solomon HOT 7 CLOSED

Comments (7)

Related Issues (5)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent