Here are a few possible solutions: Add a seek version that rev

Zip inside zip with stored compression fails to open about rs-async-zip HOT 4 CLOSED

majored commented on August 16, 2024

Zip inside zip with stored compression fails to open

from rs-async-zip.

Comments (4)

Majored commented on August 16, 2024

Reading in reverse is quite awkward as you'd have to splice the read operations with seeks as well, instead of as it currently is where you seek once & continuously read from there - it's the only thing I can see working though.

AsyncDelimiterReader is also used elsewhere in this crate so directly modifying its behaviour isn't ideal.

Just wondering if anyone has any more optimal ideas?

from rs-async-zip.

mobad commented on August 16, 2024

Here are a few possible solutions:

Add a seek version that reverse matches?
Read to end in to a cache and reverse search cache then return match index and calling class will seek back?
Combine the end of cd reading and the matching in to one function so you can avoid forcing the calling class to seek to parse end of cd.
Return a list (iterator?) of match indexes and calling class chooses last and seeks to that

The last one isn't ideal though as I imagine 99% of the time the end of cd will be very close to the end of the file so it's searching through 64kb of nothing until it gets to the last like 22 bytes.

Most of these probably mean not using the Read trait though.

from rs-async-zip.

Majored commented on August 16, 2024

For the moment, I've altered it so that upon the match, the offset is stored & we continue reading until EOF whilst overwriting the offset when we encounter later EOCDHs. Afterward, we seek back to that last offset and continue where we left off.

This is the best solution I can see for the moment that doesn't require a big alteration in how finding the ECODH currently works. I've added a comment to revisit this in the future, but for the moment, this seems to work well. I've tested with the example ZIP you provided and that successfully reads.

Should be fixed with: d9c44e1

Let me know if you still run into issues.

from rs-async-zip.

mobad commented on August 16, 2024

Thanks! It seems to have fixed those zips but I'm still having issues with my real test case unfortunately.

It sometimes works and sometimes doesn't, really weird.
I believe it just happens to hit an edge case where the delimiter is split by two reads.

I've created another test case here, note its only really reproducible when you shrink the buffer size to 4<=n<8 (or some small primes) to help split reads across a delimiter.
multiple_bad_bytes.zip
(Just multiple delimiters with an extra byte in between.)

I've also included the original zip I was having issues with that should intermittently fail with the default buffer size but will probably be easier to reproduce with some smaller numbers.
test_created.zip

One issue I think is in AsyncDelimiterReader.
If you get a partial match at the end of the read_slice then you will lose those bytes as read_slice_len is < match_index + delimiter.len() and they won't get prepended.
I think just adding an else that prepends match_index.. would work.

Smaller issue is I think b.set_filled(match_index); should be using index instead so it returns an accurate count of bytes read (doesn't really matter as it isn't using the bytes read, but read_cd could potentially just do that to avoid the seeks to get the current offset)

Another potential issue (but won't happen with fixed buffer sizes I believe) is if you do a read, get a match, prepend remaining, then call read again but this time with a smaller buffer and it gets a match then there will still be bytes remaining in the prepend buffer when the new left over bytes are "prepended" they will be added to the end of the prepend buffer instead of the beginning which will corrupt the data.
I think prepend should check if buffer is empty and if it's not then prepend the new data, not extend.

I think some unit tests for AsyncDelimiterReader with small buffer sizes might make things easier to test as well.

from rs-async-zip.

Zip inside zip with stored compression fails to open about rs-async-zip HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent