Comments (4)
Reading in reverse is quite awkward as you'd have to splice the read operations with seeks as well, instead of as it currently is where you seek once & continuously read from there - it's the only thing I can see working though.
AsyncDelimiterReader is also used elsewhere in this crate so directly modifying its behaviour isn't ideal.
Just wondering if anyone has any more optimal ideas?
from rs-async-zip.
Here are a few possible solutions:
- Add a seek version that reverse matches?
- Read to end in to a cache and reverse search cache then return match index and calling class will seek back?
- Combine the end of cd reading and the matching in to one function so you can avoid forcing the calling class to seek to parse end of cd.
- Return a list (iterator?) of match indexes and calling class chooses last and seeks to that
The last one isn't ideal though as I imagine 99% of the time the end of cd will be very close to the end of the file so it's searching through 64kb of nothing until it gets to the last like 22 bytes.
Most of these probably mean not using the Read trait though.
from rs-async-zip.
For the moment, I've altered it so that upon the match, the offset is stored & we continue reading until EOF whilst overwriting the offset when we encounter later EOCDHs. Afterward, we seek back to that last offset and continue where we left off.
This is the best solution I can see for the moment that doesn't require a big alteration in how finding the ECODH currently works. I've added a comment to revisit this in the future, but for the moment, this seems to work well. I've tested with the example ZIP you provided and that successfully reads.
Should be fixed with: d9c44e1
Let me know if you still run into issues.
from rs-async-zip.
Thanks! It seems to have fixed those zips but I'm still having issues with my real test case unfortunately.
It sometimes works and sometimes doesn't, really weird.
I believe it just happens to hit an edge case where the delimiter is split by two reads.
I've created another test case here, note its only really reproducible when you shrink the buffer size to 4<=n<8 (or some small primes) to help split reads across a delimiter.
multiple_bad_bytes.zip
(Just multiple delimiters with an extra byte in between.)
I've also included the original zip I was having issues with that should intermittently fail with the default buffer size but will probably be easier to reproduce with some smaller numbers.
test_created.zip
One issue I think is in AsyncDelimiterReader.
If you get a partial match at the end of the read_slice then you will lose those bytes as read_slice_len is < match_index + delimiter.len() and they won't get prepended.
I think just adding an else that prepends match_index.. would work.
Smaller issue is I think b.set_filled(match_index); should be using index instead so it returns an accurate count of bytes read (doesn't really matter as it isn't using the bytes read, but read_cd could potentially just do that to avoid the seeks to get the current offset)
Another potential issue (but won't happen with fixed buffer sizes I believe) is if you do a read, get a match, prepend remaining, then call read again but this time with a smaller buffer and it gets a match then there will still be bytes remaining in the prepend buffer when the new left over bytes are "prepended" they will be added to the end of the prepend buffer instead of the beginning which will corrupt the data.
I think prepend should check if buffer is empty and if it's not then prepend the new data, not extend.
I think some unit tests for AsyncDelimiterReader with small buffer sizes might make things easier to test as well.
from rs-async-zip.
Related Issues (20)
- deflate64 support
- Support to read Info-ZIP Unicode Path Extra Field HOT 1
- Incorrect Zip64 Header Implementation HOT 8
- Tokio file doesn't satisfy `tokio::fs::File: futures_lite::AsyncWrite` HOT 1
- Incorrect parsing of zip64 extended information field HOT 1
- New Release? HOT 3
- Offset and size of entry in archive HOT 1
- Parallel Zip Stream support HOT 2
- Async ZIP Streaming?... HOT 13
- Test failures on 32-bit arches HOT 2
- data descriptor not supported error HOT 11
- Broken pipe occurs. HOT 4
- Streaming - Write file in chunks HOT 2
- [Question][Offtopic] Async Zip Compat HOT 3
- Feature: data descriptors in streaming mode timeline HOT 3
- Changelog HOT 2
- How to construct ZipDateTime without chrono dependency? HOT 3
- Unable To Locate EOCDR HOT 8
- `tokio::io::AsyncRead`(`Ext`) doesn't seem to be implemented for `async_zip::tokio::read::ZipEntryReader` HOT 2
- OwnedEntryStreamWriter HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rs-async-zip.