Comments (7)
I like it. Make the file descriptor a list of 'inodes' and I think you have something there. Reduces the cost of appending to the end of the file.
from pifs.
It seems like this would just be a tradeoff between compression speed and compression ratio, this being a novelty program I personally think it'd have more use if the ratio was higher, at least then there would be SOME practical use for it, if transfer speed between computers was a heavy bottleneck (floppy drives). If write speed is a bottleneck, there's much much better filesystems and compression algos that satisfy that niche, but a heavy compression algorithm has many more use cases imo.
from pifs.
Oh gosh, I completely forgot about this project.
I think it's a matter of difference in opinion rooted in whether you're thinking about files at rest or files in motion. Encoding a file in pifs could take an arbitrarily long time if you represent the file as a single offset. During that time you can report no progress to an interested party.
If you made the file contents into a series of offsets (leaving aside hamming distance and correction codes for a moment), then you could stream the encoding at creation time. This could be writing a new file on disk (moving from memory to disk) or lazily encoding a file for transport to another system. Additionally, as I mentioned before, appending to a file is cheap because the common prefix between the new and old file could be represented by the same bytes.
from pifs.
I'd honestly say that pi-based compression is better as a program like 7-zip, rather than a filesystem, but i'd gladly make a sacrifice of speed for a heavy compression algorithm.
It seems to me (just based on intuition) that as the amount required to store the location in pi increases linearly, the amount of data stored increases exponentially (this may not be the case, or it might have a zero that's way more than any sensible person could store, or take longer than any person could actually be expected to wait)
While this might have some application, I doubt that write speed would ever be very good, and it might not even be feasible for large amounts of data.
Though, error correction COULD be used to vastly decrease storage size, if we find a partial match early in pi, and then correct the errors from there. I'd say that if we have a n
byte file, and we find a match of the file that has more than n/2
bytes correct, we generate error correction data based on that match and then write the location, ecc data, and length to a file, and it could even be shorter than the file itself in some cases.
I might honestly just write my own version at this point. Thoughts?
from pifs.
From my experience, when I tried to store a small image, the file size of the pi compressed file dramatically larger than the file before compression. Could be a draw of the luck but it could also be that the index value has to be massive to accompany the data.
from pifs.
That's why the error correction data would be added, so that we could find a much lower match and still have it match a majority of the file, storing less data.
from pifs.
Pigeonhole principle. If you want a library with an infinite number of books, (which is what you're doing here) then the catalog is also infinite. The numbering on the book will on average be the same size as the contents of the book. In some cases shorter, in others longer. Compression just exploits the fact that some sequences of bytes are gibberish and are not useful to human beings. So gibberish takes longer numbers and comprehensible data shorter ones. The Library of Congress has standards that don't include admitting books full of gibberish. So they only have to index books that make some sense. Finnegan's Wake notwithstanding.
But pi is irrational. It's gibberish by definition. You're trying to find one sensible sequence of bytes in a sea of random noise. Fuzzing makes a smaller file if you can approximate the output with a simpler pattern. Essentially predict what a more boring version of the file would look like. There is no simpler pattern here. The file is always going to take as many bytes as it will take.
The counterproposal I made doesn't improve space either. In fact it's a little worse. I'm constrained by the same laws of Information Theory as the rest of you. But it would improve time, and quite substantially.
But as we're essentially talking about BogoSort for files, there's not enough beer in this conversation I'm afraid :)
from pifs.
Related Issues (20)
- Inception- No saftey net
- Expander? HOT 1
- GDPR Compliance HOT 7
- Misleading/Incorrect claim regarding compression. HOT 15
- Use pifs to store metadata HOT 4
- Viewing the output MetaData file HOT 1
- Maybe have a compression setting?
- ./configure: line 4310: syntax error near unexpected token `FUSE,` HOT 3
- Is there an automated tool to decrypt the metadata? HOT 2
- Copyright HOT 4
- Stop development HOT 3
- Metadata error
- Shorthand encoding for positions
- How does the metadata work, and how can I retrieve my data?
- Support for √2? HOT 1
- The Last Question
- But computer doesn’t have endless memory for pi HOT 1
- πEP 314: Python bindings HOT 1
- Mathematically Delicious
- What is the Big O cost of the pifs index search algorithm?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pifs.