Giter Club home page Giter Club logo

Comments (4)

Majored avatar Majored commented on August 16, 2024

Reading in reverse is quite awkward as you'd have to splice the read operations with seeks as well, instead of as it currently is where you seek once & continuously read from there - it's the only thing I can see working though.

AsyncDelimiterReader is also used elsewhere in this crate so directly modifying its behaviour isn't ideal.

Just wondering if anyone has any more optimal ideas?

from rs-async-zip.

mobad avatar mobad commented on August 16, 2024

Here are a few possible solutions:

  1. Add a seek version that reverse matches?
  2. Read to end in to a cache and reverse search cache then return match index and calling class will seek back?
  3. Combine the end of cd reading and the matching in to one function so you can avoid forcing the calling class to seek to parse end of cd.
  4. Return a list (iterator?) of match indexes and calling class chooses last and seeks to that

The last one isn't ideal though as I imagine 99% of the time the end of cd will be very close to the end of the file so it's searching through 64kb of nothing until it gets to the last like 22 bytes.

Most of these probably mean not using the Read trait though.

from rs-async-zip.

Majored avatar Majored commented on August 16, 2024

For the moment, I've altered it so that upon the match, the offset is stored & we continue reading until EOF whilst overwriting the offset when we encounter later EOCDHs. Afterward, we seek back to that last offset and continue where we left off.

This is the best solution I can see for the moment that doesn't require a big alteration in how finding the ECODH currently works. I've added a comment to revisit this in the future, but for the moment, this seems to work well. I've tested with the example ZIP you provided and that successfully reads.

Should be fixed with: d9c44e1

Let me know if you still run into issues.

from rs-async-zip.

mobad avatar mobad commented on August 16, 2024

Thanks! It seems to have fixed those zips but I'm still having issues with my real test case unfortunately.

It sometimes works and sometimes doesn't, really weird.
I believe it just happens to hit an edge case where the delimiter is split by two reads.

I've created another test case here, note its only really reproducible when you shrink the buffer size to 4<=n<8 (or some small primes) to help split reads across a delimiter.
multiple_bad_bytes.zip
(Just multiple delimiters with an extra byte in between.)

I've also included the original zip I was having issues with that should intermittently fail with the default buffer size but will probably be easier to reproduce with some smaller numbers.
test_created.zip

One issue I think is in AsyncDelimiterReader.
If you get a partial match at the end of the read_slice then you will lose those bytes as read_slice_len is < match_index + delimiter.len() and they won't get prepended.
I think just adding an else that prepends match_index.. would work.

Smaller issue is I think b.set_filled(match_index); should be using index instead so it returns an accurate count of bytes read (doesn't really matter as it isn't using the bytes read, but read_cd could potentially just do that to avoid the seeks to get the current offset)

Another potential issue (but won't happen with fixed buffer sizes I believe) is if you do a read, get a match, prepend remaining, then call read again but this time with a smaller buffer and it gets a match then there will still be bytes remaining in the prepend buffer when the new left over bytes are "prepended" they will be added to the end of the prepend buffer instead of the beginning which will corrupt the data.
I think prepend should check if buffer is empty and if it's not then prepend the new data, not extend.

I think some unit tests for AsyncDelimiterReader with small buffer sizes might make things easier to test as well.

from rs-async-zip.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.