I want to start by saying that I am not a mathematician (in fact I am very bad with ma

I like it. Make the file deor a list of 'inodes' and I think you have something

Is fuzzing / error correction a possibility to decrease write times? about pifs HOT 7 OPEN

philipl commented on September 28, 2024

Is fuzzing / error correction a possibility to decrease write times?

from pifs.

Comments (7)

jdmarshall commented on September 28, 2024

I like it. Make the file descriptor a list of 'inodes' and I think you have something there. Reduces the cost of appending to the end of the file.

from pifs.

typecasto commented on September 28, 2024

It seems like this would just be a tradeoff between compression speed and compression ratio, this being a novelty program I personally think it'd have more use if the ratio was higher, at least then there would be SOME practical use for it, if transfer speed between computers was a heavy bottleneck (floppy drives). If write speed is a bottleneck, there's much much better filesystems and compression algos that satisfy that niche, but a heavy compression algorithm has many more use cases imo.

from pifs.

jdmarshall commented on September 28, 2024

Oh gosh, I completely forgot about this project.

I think it's a matter of difference in opinion rooted in whether you're thinking about files at rest or files in motion. Encoding a file in pifs could take an arbitrarily long time if you represent the file as a single offset. During that time you can report no progress to an interested party.

If you made the file contents into a series of offsets (leaving aside hamming distance and correction codes for a moment), then you could stream the encoding at creation time. This could be writing a new file on disk (moving from memory to disk) or lazily encoding a file for transport to another system. Additionally, as I mentioned before, appending to a file is cheap because the common prefix between the new and old file could be represented by the same bytes.

from pifs.

typecasto commented on September 28, 2024

I'd honestly say that pi-based compression is better as a program like 7-zip, rather than a filesystem, but i'd gladly make a sacrifice of speed for a heavy compression algorithm.

It seems to me (just based on intuition) that as the amount required to store the location in pi increases linearly, the amount of data stored increases exponentially (this may not be the case, or it might have a zero that's way more than any sensible person could store, or take longer than any person could actually be expected to wait)

While this might have some application, I doubt that write speed would ever be very good, and it might not even be feasible for large amounts of data.

Though, error correction COULD be used to vastly decrease storage size, if we find a partial match early in pi, and then correct the errors from there. I'd say that if we have a n byte file, and we find a match of the file that has more than n/2 bytes correct, we generate error correction data based on that match and then write the location, ecc data, and length to a file, and it could even be shorter than the file itself in some cases.

I might honestly just write my own version at this point. Thoughts?

from pifs.

mmhobi7 commented on September 28, 2024

From my experience, when I tried to store a small image, the file size of the pi compressed file dramatically larger than the file before compression. Could be a draw of the luck but it could also be that the index value has to be massive to accompany the data.

from pifs.

typecasto commented on September 28, 2024

That's why the error correction data would be added, so that we could find a much lower match and still have it match a majority of the file, storing less data.

from pifs.

jdmarshall commented on September 28, 2024

Pigeonhole principle. If you want a library with an infinite number of books, (which is what you're doing here) then the catalog is also infinite. The numbering on the book will on average be the same size as the contents of the book. In some cases shorter, in others longer. Compression just exploits the fact that some sequences of bytes are gibberish and are not useful to human beings. So gibberish takes longer numbers and comprehensible data shorter ones. The Library of Congress has standards that don't include admitting books full of gibberish. So they only have to index books that make some sense. Finnegan's Wake notwithstanding.

But pi is irrational. It's gibberish by definition. You're trying to find one sensible sequence of bytes in a sea of random noise. Fuzzing makes a smaller file if you can approximate the output with a simpler pattern. Essentially predict what a more boring version of the file would look like. There is no simpler pattern here. The file is always going to take as many bytes as it will take.

The counterproposal I made doesn't improve space either. In fact it's a little worse. I'm constrained by the same laws of Information Theory as the rest of you. But it would improve time, and quite substantially.

But as we're essentially talking about BogoSort for files, there's not enough beer in this conversation I'm afraid :)

from pifs.

Is fuzzing / error correction a possibility to decrease write times? about pifs HOT 7 OPEN

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent