Giter Club home page Giter Club logo

rc-zip's People

Contributors

fasterthanlime avatar folkertdev avatar github-actions[bot] avatar joshwd36 avatar messense avatar mufeedvh avatar muja avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rc-zip's Issues

Use nom's streaming parsers

I was wondering why some parsers returned a Failure/Error rather than Incomplete - this is why!

Since we're doing our own buffering, especially when reading central directory headers, we want the streaming variant.

Support for extracting Apple's "endless" zip streams

I actually haven't tested your code to see if it already supports this, but since I just wrote something about Apple's special zip format (https://iosdev.space/@tempelorg/111993220533529890), I thought I'd look for hastags on zip64 and found this.

So here's the deal: In order to allow >4GB files compressed without using zip64, Apple simply streams the entire file while leaving the sizes in the header set to zero. And the entry in the directory contains the size mod 2^32, of course. So you cannot predict the resulting size, but Apple's unzip can still decompress the files because it blindly decompresses the stream until its end marker.

Would you nice if you support that, too, and spread the word, as Apple has sadly not made this very public.

I'm happy to provide more info and sample files.

Remove EOFNormalizer

Was added 5 years ago by yours truly. Was it ever needed? Who knows. I'm pretty sure EOF works normally on Windows right now though

More ways to zip slip jean

#13 found a dot-dot directory traversal vulnerability in the jean example program. The mitigation there of stripping ../ from paths is incomplete, and there are still other ways to escape the target directory.

I'm looking at commit d43c169.

Dot-dot

replace.zip

The mitigation for #13 treats file names as strings, and removes all instances of the substring ../ (or ..\ on Windows).
https://github.com/rust-compress/rc-zip/blob/d43c16991b30dce0e25862854b16883eb3dd80f0/samples/jean/src/main.rs#L224-L228

First, ../ should be stripped even on Windows. / is the zip file directory separator, regardless of the separator of the underlying filesystem (see appnote.txt 4.4.17).

Second, a file name may become a directory traversal after being transformed. For example, ..././escape.txt becomes ../escape.txt.

rm -f replace.zip
mkdir ...
touch .../escape.txt
zip -0 -X replace.zip ..././escape.txt
rm -r ...
cargo run -- unzip replace.zip
stat ../escape.txt

Symlink traversal

symlink.zip

One entry may be a directory symbolic link pointing outside the target directory, and a later entry may write through the symlink.

This is like CVE-2002-1216 in GNU Tar.

rm -f symlink.zip
ln -s ../ path
zip -0 -X --symlinks symlink.zip path
rm path
mkdir path
touch path/escape.txt
zip -0 -X -D symlink.zip path/escape.txt
rm -r path
cargo run -- unzip symlink.zip
stat ../escape.txt

Absolute path

absolute.zip

File names starting with / are treated as absolute and can write outside the target directory. On Windows, prefixes like C:\ and UNC paths like \\ComputerName\ may also work.

This is like CVE-2001-1269 in Info-ZIP UnZip.

import zipfile
with zipfile.ZipFile("absolute.zip", mode="w") as z:
    z.writestr("/tmp/escape.txt", "")
python3 absolute.py
cargo run -- unzip absolute.zip
stat /tmp/escape.txt

UnZip strips the absolute prefix in this case:

Archive:  absolute.zip
warning:  stripped absolute path spec from /tmp/escape.txt
 extracting: tmp/escape.txt

InvalidLocalHeader when parsing a zip file which can be extracted by unzip.

Seal.zip

$ unzip Seal.zip
Archive:  Seal.zip
warning [Seal.zip]:  1292 extra bytes at beginning or within zipfile
  (attempting to process anyway)
  inflating: Doc_0/Document.xml
  inflating: Doc_0/PublicRes_0.xml
  inflating: Doc_0/Pages/Page_0/Content.xml
  inflating: OFD.xml
$ ./target/debug/jean unzip Seal.zip
The application panicked (crashed).
Message:  called `Result::unwrap()` on an `Err` value: Custom { kind: Other, error: Format(InvalidLocalHeader) }
Location: src/libcore/result.rs:1165

avoiding the get_reader function argument

creating an EntryReader requires passing a function that takes an offset and returns an instance of Read: https://github.com/rust-compress/rc-zip/blob/69b884f85085c5c99a846703f83ddc48e0d086ac/src/format/archive.rs#L218

It happens because a StoredEntry has no access to the original data source.

While the internal state machine should be able to work without owning the IO source, the public API is nicer if it wraps it. But that would also be dependent on the IO source. Some would support seeking, others would not.

So, ideally, the public API would allow writing this:

let file = File::open(matches.value_of("file").unwrap())?;
let reader = file.read_zip()?;
info(&reader);

for entry in reader.entries() {
  println!("Extracting {}", entry.name());
  let mut contents = Vec::<u8>::new();
  entry.read_to_end(&mut contents)?;
}

But it would also support an IO source that would do HTTP range requests to get the entries.

In lapin, we had the async crate that would give the state machine. Its use was low level, just supporting buffers in and out:
https://github.com/sozu-proxy/lapin/blob/0.16.0/async/examples/connection.rs

But the futures based API was wrapping the state machine and the IO source (that could be a TcpStream, or a SSL stream defined with openssl or rustls) inside a tokio transport:
https://github.com/sozu-proxy/lapin/blob/0.16.0/futures/src/transport.rs#L45-L137

feat: Re-add async interface

It was removed in previous versions, but I think we could re-add a version based only on tokio I/O traits.

This wouldn't accommodate for things like io-uring, but there would probably be commonalities between the sync & async interface and it might clean up the overall package.

bug: NTFS timestamps are parsed wrong

In the tests, utf8-winrar.zip has a file that we assert to have a modified date of 2017-11-06T13:09:26Z, but it should actually be 2017-11-06T21:09:27.867862500Z.

NanaZip confirms:

image

My 5 coin

I looked at your work. It seemed to me more convenient than crate "zip" (work on which is stopped). But I have a few comments.

  1. The StoredEntry structure does not support Clone/Copy trait. If it supported, then it would be possible to save the received StoredEntry somewhere, to return to work with the archive later.

  2. There are no examples.

  3. Missing description of get_reader function in declaration
    pub fn reader<'a, F, R>(&' a self, get_reader: F) -> EntryReader <'a, R> where R: Read, F: Fn(u64) -> R
    I think this function gets an offset for Read, changes position and returns Read.
    But not so simple.
    3.1 get_reader returns an object, i.e. transfers ownership somewhere inside the library. You want to say that I have to make Clone to my Read object before giving it away? std::fs::File does not support cloning.
    3.2. Let's look at another example. He is almost a worker, but still not.

fn open1 (archname: & dyn AsRef <Path>)
{
    let mut f = File :: open (archname) .unwrap ();
    let arch = parse_zip (& mut f);
    let entry = & arch.entries () [1];

    let rdr = entry.reader (| offset |
    {
        f.seek (std :: io :: SeekFrom :: Start (offset)). unwrap ();
        f
    });
}

Here f cannot be used because the closure type is Fn, although it would have to be FnMut.
move | offset | {...} also does not work

Fuzz this

Adding fuzz targets for all the functions that take byte slices is an easy way to assert that they don't panic unexpectedly. Even if it's unlikely to find buffer overflows or use after free in Rust code, in my experience there are at least one or two integer overflows that a fuzzer can find :)

cf. https://rust-fuzz.github.io/book/introduction.html for how to fuzz Rust libraries

Support WebAssembly

Error when compiling to wasm32-unknown-unknown:

   Compiling rc-zip v0.0.1
error[E0277]: the trait bound `std::fs::File: ReadAt` is not satisfied
  --> /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/rc-zip-0.0.1/src/reader/read_zip.rs:64:36
   |
64 |         ReadAt::read_zip_with_size(self, size)
   |                                    ^^^^ the trait `ReadAt` is not implemented for `std::fs::File`
   |
   = note: required for the cast to the object type `dyn ReadAt`

Removing Chardet dependency (or switch to chardetng) to not use LGPL3

Hi,

I'm doing an "off the clock"/weekend project that I am hoping will become commercial (but it's far from it right now) and I want to use rc-zip because it works really really well for the zip files I'm using. However it's dependent on chardet which is LGPL (probably because chardet is LGPL) which means I can't use it in a commercial project (given how rust links the libraries).

Is there any way to make rc-zip not use chardet? For example using https://crates.io/crates/chardetng (not sure if it covers the right things)

To put my money where my mouth is I'm more than happy to donate 100USD for this effort via PayPal (or donate somewhere if you prefer). I know it's not much (especially to change a dependency, create a new version and upload to crates.io) but given that my project is miles away from generating any revenue at all it's all I can afford right now really :) I guess it's more a token of appreciation.

I know the right way is to submit a PR but frankly my rust skills are probably too limited to do this right.

Regards,
Niklas

Reading a file entry (newbie question) - trying to use the examples

Hi,

Sorry this is a bit of a newbie question - I'm trying to follow the read file examples but I can't seem to get it to work. I'm trying to open a zip file, loop through the entries and pass the buffer of the file I want to quick xml (without writing the file to disk). Opening and looping is super simple, but I fail at passing the buffer.

I tried:

let entry_reader = c
                            .entry
                            .sync_reader(|offset| positioned_io::Cursor::new_pos(&zipfile, offset));

But it says that sync_reader method doesn't exist.

I tried the other example:

rc_zip::reader::sync::EntryReader::new(sl.entry, |offset| {
                            positioned_io::Cursor::new_pos(&zipfile, offset)
                        })
                        .read_to_string(&mut target)
                        .unwrap();

But it says that the reader is a private method.

The closest I come is:

let entry_reader = c.entry.reader(|offset| positioned_io::Cursor::new_pos(&zipfile, offset));

Which I think gives me a proper entry_reader, but I can't seem to figure out how to turn this into a Buffer so quick xml can read it. Or figure out how to read it at all.

This is all probably explained in the crates documentation but I'm still at the point where the automatically generated documentation is a bit greek to me :)

This is how I normally read a buffer in quick xml:

let xml_file = File::open(filepath)?;
let buffer = BufReader::new(xml_file);
let mut reader = Reader::from_reader(buffer);
let mut buf = Vec::new();

Any help will be greatly appreciated. Also, to put my money where my mouth is I will send anyone who helps me a beer tip via PayPal

feat: More efficient `AsyncRandomAccessFileCursor` implementation

The 128K buffer is probably okay, although, overkill?

But more importantly, especially for evil tests that do 1-byte reads, let's have it read based on its internal buffer size, rather than the passed buffer.

This needs more logic to return from the internal buffer rather than spawn a blocking task on subsequent reads, as long as we have data in the internal buffer. It turns it into a BufReader sort of but... that's a good place to do it.

Secondly, the Box::pin is probably wholly unnecessary, it could just be a JoinHandle I think.

feat: support deflate64 compression method

~/bearcove/rc-zip-samples
❯ jean file nystatehealthcosts2016-2021.zip
Version made by: {MsDos v4.5}, required: {MsDos v4.5}
Encoding: utf-8, Methods: {Unsupported(9)}
4.00 GiB (128.77% compression) (1 files, 0 dirs, 0 symlinks)

We don't support Method 9 right now but according to 7-zip it's Deflate64:

image

Question: ArchiveReader::read() blocks?

Hi, this is a question and not really an issue. I'm looking at using rc-zip with in an async-await environment. However, ArchiveReader::read() looks to be the only way to feed the reader file data and it will block during the read. Is there an alternate way to feed it data that I may have read using async read directly? Is an async version of read() necessary?

Thanks for clarifying.

Bug: jean shows nystatehealthcosts2016-2021.zip as having a 4GiB entry when it's 5.5GiB

~/bearcove/rc-zip-samples
❯ jean file nystatehealthcosts2016-2021.zip
Version made by: {MsDos v4.5}, required: {MsDos v4.5}
Encoding: utf-8, Methods: {Unsupported(9)}
4.00 GiB (128.77% compression) (1 files, 0 dirs, 0 symlinks)

The sample file is publicly available data but to avoid potential S3 costs for the people who own the bucket, I won't link to it here.

Is the library alive?

Hi!

As far as I see, the library is not maintained anymore. @fasterthanlime do you still maintain this library? Because now the current library state is a little bit unclear.

Thanks in advance!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.