Giter Club home page Giter Club logo

rs-async-zip's People

Contributors

anatawa12 avatar arcticlampyrid avatar baszalmstra avatar citreae535 avatar connec avatar damziobro avatar dependabot[bot] avatar dignifiedquire avatar erk- avatar fenhl avatar fl33tw00d avatar jakubadamw avatar jonathanxd avatar kianmeng avatar majored avatar maksim-isakau avatar mineichen avatar mobad avatar nipunn1313 avatar nobodyxu avatar notgull avatar sigoden avatar sireliah avatar st4nni avatar startewho avatar thibaultamartin avatar vivekpanyam avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

rs-async-zip's Issues

Support for setting filename on `ZipEntryBuilder`

I have a use case where I'm unzipping one file and putting the files into a subdirectory of another zip.

It would be great if I could take the ZipEntry from one, convert it to a ZipEntryBuilder and call .filename

Currently, the workaround I have to use involves reading all the information from the ZipEntry and constructing a new one from scratch, since the constructor is the only way to set the filename.

Release timeline

Do you have a planned timeline for the next release?

Also, have you considered publishing it as 0.1.0 (instead of 0.0.10) since this is a big change from previous releases?

Some weird behaviour

Hey,
I tried to read a simple UTF8 txt file from a zip archive.

use async_zip::read::fs::ZipFileReader;
use tokio::io::AsyncReadExt;

#[tokio::main]
async fn main() {
    let zip = ZipFileReader::new("resources.zip".to_string()).await.unwrap();
    let (index, entry) = zip.entry("content.txt").unwrap();
    let mut entry_reader = zip.entry_reader(index).await.unwrap();

    let file_length = entry.uncompressed_size().unwrap();

    let mut buffer = String::with_capacity(file_length as usize);
    let bytes_read = entry_reader.read_to_string(&mut buffer).await.unwrap();
    println!("{}:{}", bytes_read, buffer);
}

I get an error that the file is not utf8-encoded:

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Custom { kind: InvalidData, error: "stream did not contain valid UTF-8" }', src\main.rs:13:69

So I just did a hack to see what bytes actually are inside the file:

use async_zip::read::fs::ZipFileReader;
use tokio::io::AsyncReadExt;

#[tokio::main]
async fn main() {
    let zip = ZipFileReader::new("resources.zip".to_string()).await.unwrap();
    let (index, entry) = zip.entry("content.txt").unwrap();
    let mut entry_reader = zip.entry_reader(index).await.unwrap();

    let file_length = entry.uncompressed_size().unwrap();

    let mut buffer = Vec::with_capacity(file_length as usize);
    let bytes_read = entry_reader.read_to_end(&mut buffer).await.unwrap();
    println!("{}:{}", bytes_read, unsafe{ std::str::from_utf8_unchecked(&buffer) });
}

The console output is the following:

30:
vETF]�:�   �   � $     

Is there something I'm doing wrong? It seems really weird.
In case you are interested, I can send you the zip file.

Addition of inner type which doesn't pass through shutdown() calls

As per: https://users.rust-lang.org/t/stream-data-in-compress-and-stream-out/72521/6

At current, the stream writer implementation calls shutdown() on the compression writer when attempting to close the entry.

This is done as the upstream compression crate does not implement any way to finish the encoding without calling shutdown().
However, their implementation of the shutdown also polls the inner writer's shutdown, thus chaining the shutdown call up to the generic writer.

I don't see any way to avoid this shutdown call, but feel free to comment below if you do.
So for the time being, an inner type wrapping the generic writer should be implemented which ignores calls to poll_shutdown().

Likely want to be implemented within: https://github.com/Majored/rs-async-zip/blob/main/src/write/compressed_writer.rs

Incorrect assumption in StoredZipEntry::data_offset

StoredZipEntry::data_offset computes the position of the compressed data for a particular entry by taking the position of the local file header and adding the header length and the trailing data length:

/// Returns the offset in bytes from where the data of the entry starts.
pub fn data_offset(&self) -> u64 {
let header_length = SIGNATURE_LENGTH + LFH_LENGTH;
let trailing_length = self.entry.filename.as_bytes().len() + self.entry.extra_field().len();
self.file_offset + (header_length as u64) + (trailing_length as u64)
}

Unfortunately, the calculation of the trailing data length here is incorrect. Specifically, it's using the length of self.entry.extra_field() which is based on the extra_field in the central directory record. extra_field can have different lengths in the local file header and in the central directory record.

This leads to data_offset returning the wrong position and causes errors decompressing files. As far as I know, the only correct way to find data_offset is by reading the local file header to find the length of the extra_field following it.

(And if you're doing that, you can also read the length of the filename field from the local file header as well)

`deflate decompression error` after around 20% of the file is decompressed

I'm currently migrating from zip to async_zip because my program uses tokio IO everywhere, except for in zip files. I'm using the GitHub main branch, not crates.io because of #64. Here is the function I wrote for extracting a given reader to an output directory:

/// Extract the `input` zip file to `output_dir`
pub async fn extract_zip(
    input: impl AsyncRead + AsyncSeek + Unpin,
    output_dir: &Path,
) -> Result<()> {
    let mut zip = ZipFileReader::new(input).await?;
    for i in dbg!(0..zip.file().entries().len()) {
        dbg!(i);
        let entry = zip.file().entries()[i].entry();
        let path = output_dir.join(entry.filename());

        if entry.dir() {
            create_dir_all(&path).await?;
        } else {
            if let Some(up_dir) = path.parent() {
                if !up_dir.exists() {
                    create_dir_all(up_dir).await?;
                }
            }
            copy(&mut zip.entry(i).await?, &mut File::create(&path).await?).await?;
        }
    }
    Ok(())
}

It's almost identical to your example extractor, minus the sanitation because I trust the download sources (for now at least).

I was getting deflate decompression errors at seemingly random places. So I tried debugging it by printing out the indices and the total length (as shown in the code) and I came to a weird conclusion. It seems decompression fails at around 15%-20% of the total length. I have no idea what's going on, and thanks in advance for any help.

Regression: cannot loop over entries in ZIP archives after upgrading to v0.0.8

I'm using this crate to loop over entries in ZIP archives and extract them. It works as expected on version 0.0.7. However, after upgrading to version 0.0.8, the same function would sometimes fail with error Encountered an unexpected header (actual: 0x0, expected: 0x6054b50) . I noticed that those "bad" archives have one thing in common: they contain no dir entries but still have a path structure, i.e., "folder/file.ext" exists, but "folder" does not exist. I believe this is what caused the regression.

Fails to compile as a dependency

Looks like the dependency async_io_utilities was yanked from crates.io, resulting in this crate no longer compiling as a dependency in other crates.

error: no matching package named `async_io_utilities` found
location searched: registry `crates-io`
required by package `async_zip v0.0.9`
    ... which satisfies dependency `async_zip = "^0.0.9"` of package ...

crypak / p4k file / zip64

So, I realize this isn't in the realm of responsibility for a zip archive library, but I am trying to open a p4k file (from star citizen) using your lib and getting an unexpected header. My research indicates (even from cryengine's own developer docs) that a crypak/p4k file should just be a zip with some zstd or deflate compression and some files encrypted while others are not even compressed. From what I have been able to find, I should be able to just list the files with standard zip capability, and yet I am getting this unexpected header before I even try to extract.

    Finished dev [unoptimized + debuginfo] target(s) in 4.70s
     Running `target\debug\scprospector.exe`
[src\lib.rs:6] &file = tokio::fs::File {
    std: File {
        handle: 0x00000000000000dc,
        path: "\\\\?\\C:\\Users\\David\\repos\\scprospector\\Data.p4k",
    },
}
file open; trying to open as zip
Error: Encountered an unexpected header (actual: 0x99df8764, expected: 0x2014b50).
error: process didn't exit successfully: `target\debug\scprospector.exe` (exit code: 1)

Looking at some other open source code that successfully opens the file in dotnet and another in python, those authors have some kind of encryption key they seem to open it with. Maybe this is happening because async_zip doesn't support encryption yet? I'm trying to find out what kind of encryption it is. Maybe ZipCrypt, but I'm not sure.

Obviously this isn't something I expect you to solve, but I wondered if you have any insight into why this might happen. If not, that is ok too. Thanks!

delete me please

Edit: I'm not sure this is an issue. I should probably be sanitizing file names before I pass them to EntryWriter!!

Sorry for creating the issue prematurely. please delete me :)

Since 0.0.13 EntryStreamWriter does not implement tokio::io::AsyncWrite

Hello,

I migrated to 0.0.13. Since this release an EntryStreamWriter constructed from a async_zip::tokio::write:: ZipFileWriter doesn't implement tokio::io::AsyncWrite anymore... It is annoying when trying to use it in the tokio ecosystem...

Best regards and thanks for this useful crate.

Changelog

It would be useful to have a changelog explaining the breaking changes between versions.

OwnedEntryStreamWriter

an alternative API to write_entry_stream:

Helpful if the writer ever needs to be stored due to the inherent issues with self referential structs.

sketched out:

impl ZipFileWriter {
    async fn to_entry_stream<E: Into<ZipEntry>>(self, entry: E) -> Result<OwnedEntryStreamWriter, ZipFileWriter> {
        ...
        Ok(OwnedEntryStream { outer: self, ... })
    }
}

impl OwnedEntryStreamWriter {
    async fn to_zip_writer(self) -> Result<ZipFileWriter, OwnedEntryStreamWriter> {
        self.close().await?;
        Ok(self.outer)
    }
}

Zip inside zip with stored compression fails to open

outer.zip
I'm getting "failed to open reader: UnexpectedHeaderError(1969404127, 33639248)" when trying to open that file. (It was a larger zip but I created this test case)

I looked in to why it was happening and it looks like it's looking through the file starting at end-0xffff to end for the end of cd magic number. The problem is that since the inner zip isn't compressed by the outer zip it will actually find the inner zip's end of cd instead of the outer one.
It also runs in to problems if an arbitrary file has that byte sequence but that's probably a bit more rare.
bad_bytes.zip

I think the fix is to just make the AsyncDelimiterReader search starting from the end instead of the front of that range.

ZipEntry::dir removed

The method ZipEntry::dir, which checks whether the entry represents a directory, was removed in e4a0aa5. It's easy enough to reimplement as entry.filename().ends_with('/') but I'm curious why it was removed in the first place. If the removal was an oversight, it would be nice if it could be added back since it's convenient to have.

`tokio::io::AsyncRead`(`Ext`) doesn't seem to be implemented for `async_zip::tokio::read::ZipEntryReader`

When I use async_zip::tokio::read::seek::ZipFileReader::reader_with_entry() to get a reader, I don't seem to be able to use tokio's AsyncRead or AsyncReadExt on it, despite using the tokio specific module. This is fine for when I only want to read (since I could use the function aliases read_to_<end/string>_checked()), but when I'm using something like copy(), having AsyncRead implemented is required.

API changes to .entries() and ZipFileReader::new

I've been impacted quite a lot by the removal of the entries() iterator and ZipFileReader which now asks for an owned Vec.

But the first one is a bit more important to me, in my case i work with nested zip files which i need to search by matching the name, with the entries() api i could just use the find() iterator and get the index.

Something like this was possible before:
let (file_index, zip_entry) = zip.entries().iter().find_map(|entry| { .... })?;
With the second one i guess i can get away with a .to_vec() and call it a day (maybe i'm leaving a bit of performance on the table).

Anyway you could bring it back?

Thanks in advance for your work and for your help, this is a really nice lib overall.

Replace small reads in central directory parsing with one large read

In some zip files with large central directories, the process of initially loading the zip file can cause lots of small reads (relevant code). This can be quite slow if reads to the underlying reader are expensive.

Since we already know the size of the central directory before we start parsing it (and we know we're going to read the whole thing), we could just read it all at once.

This should be as simple as reading eocdr.size_cent_dir bytes into a temporary buffer and passing that to crate::read::cd instead of passing &mut reader as below.

reader.seek(SeekFrom::Start(eocdr.cent_dir_offset.into())).await?;
let entries = crate::read::cd(&mut reader, eocdr.num_of_entries.into()).await?;
Ok(ZipFile { entries, comment, zip64: false })
}

An alternative is for users to wrap their reader in a BufReader before passing it to ZipFileReader, but this isn't ideal as CompressedReader in this library already uses a BufReader internally. Also, users likely don't have knowledge of the central directory sizes of their files beforehand so they can't tune buffer sizes.

I can make a pull request with the solution I described above if that sounds good to you

Thanks!

Merge process

Just a suggestion - when merging a PR, the "Squash and Merge" option tends to lead to a cleaner git history - especially if the individual commits within a pull request aren't particularly descriptive. This takes all the commits in a PR, squashes them into one named after the PR title, and puts it on top of main. You can also configure this to include the PR description.

If you do want to keep all individual commits from a PR, the "Rebase and Merge" option will stack them all on top of main instead of potentially interleaving them with other commits. This isn't the case for merge commits. For example, some commits from #66 are displayed before 28a932f even though you merged #66 after 28a932f. "Rebase and Merge" avoids this.

Both "Squash and Merge" and "Rebase and Merge" make the git history more readable by avoiding having several merge commits on the main branch. I'd recommend using either of those instead of merge commits.

Unexpected header error

I'm trying to open this ZIP archive, but async_zip::read::fs::ZipFileReader::new() returns UnexpectedHeaderError:

Error: Encountered an unexpected header (actual: 0x0, expected: 0x6054b50).

All other tools I tried work well with this archive. I'm using async_zip version 0.0.8 from crates.io, rust 1.62.

Specification compliance tracking

Structure

  • Appropriate representations of headers
  • Appropriate header delimiter constants
  • Reading local file headers
  • Writing local file headers
  • Reading central directory headers
  • Writing central directory headers
  • Reading entry names & extra fields
  • Writing entry names & extra fields
  • Reading central directory entry comment
  • Writing central directory entry comment
  • Reading ending file comment
  • Writing ending file comment

Features

  • Reading entries with data descriptors
  • Writing entries with data descriptors
  • Reading entries with data descriptors (without de facto delimiter)
  • Writing entries with data descriptors (without de facto delimiter)
  • Versioning for reading
  • Versioning for writing
  • ZIP64 for reading
  • ZIP64 for writing
  • Encryption for reading
  • Encryption for writing

`ZipEntryReader::read_to_*_checked` is unusable

Both of them takes a mutable alias to self and an immutable reference to ZipEntry:

    pub async fn read_to_end_checked(&mut self, buf: &mut Vec<u8>, entry: &ZipEntry) -> Result<usize>;
    pub async fn read_to_string_checked(&mut self, buf: &mut String, entry: &ZipEntry) -> Result<usize>;

The mutable alias to self can only be obtained if we also hold a mutable alias to ZipFileReader.
But to hold an immutable reference to ZipEntry, we need to hold an immutable reference to ZipFileReader.

Thus, this API is unusable.

File permissions while writing an entry

Thanks so much for creating this awesome lib.

I have been trying to archive some executables and I am able to do that but I want to preserve the executable permission. I couldn't find any obvious API to give file perms to the entries written. I am pretty new to rust so excuse me if I missed anything. If there's an API to do this please can you point me towards it?

P.S I also have no idea regarding the zip file format.

Support for other runtimes

Support other runtimes by:

  • Changing all instances of tokio::io::{AsyncRead, AsyncWrite, AsyncSeek} and their extensions to those found inside the futures crate
  • Moving from the tokio feature to the futures-io feature of async_compression
  • Creating a submodule tokio that mirrors the main API, but using tokio_utils::compat::Compat to operate on Tokio's types instead
    • The fs module would be exclusive to this tokio module
    • Caveat: Compat does not provide an implementation of AsyncSeek. A small newtype wrapper implementing AsyncSeek could be included in this crate for internal use. Smol's async-compat provides an implementation, but it also tampers with tokio's runtime, so it might not be the best idea.
  • Gate the tokio module behind a feature flag, perhaps enabled by default
  • Applying the same changes to your sister crate async_io_utilities

I'll be modifying the crate for use with my project that uses async-std, so if this is something you're interested in, I'd be happy to polish the API and submit a PR.

UpstreamReadError on zip containing another zip

Hi, I've been using the crate for copying directories over a zip stream and I noticed that the read::stream::ZipFileReader fails in case the compressed stream contains a file that is already compressed by deflate method.

use async_zip::read::stream::ZipFileReader;
use tokio::fs::File;

#[tokio::main]
async fn main() {
    // Fails when file contains already compressed file
    // e.g. epub, odt, zip
    let mut file = File::open("with-zipfile.zip").await.unwrap();
    let mut zip = ZipFileReader::new(&mut file);
    while !zip.finished() {
        if let Some(reader) = zip.entry_reader().await.unwrap() {
            let entry = reader.entry();
            println!("{:?}", entry.name());
            let mut out = vec![];
            reader.copy_to_end_crc(&mut out, 1024).await.unwrap();
        }
    }
}
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: UpstreamReadError(Custom { kind: Other, error: DecompressError(General { msg: None }) })', src/main.rs:16:47
stack backtrace:
   0: rust_begin_unwind
             at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/std/src/panicking.rs:517:5
   1: core::panicking::panic_fmt
             at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/core/src/panicking.rs:100:14
   2: core::result::unwrap_failed
             at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/core/src/result.rs:1616:5
   3: core::result::Result<T,E>::unwrap
             at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/core/src/result.rs:1298:23
   4: test_zip::main::{{closure}}
             at ./src/main.rs:16:13
   5: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
             at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/core/src/future/mod.rs:80:19
   6: tokio::park::thread::CachedParkThread::block_on::{{closure}}
             at /.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.21.0/src/park/thread.rs:267:54
   7: tokio::coop::with_budget::{{closure}}
             at /.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.21.0/src/coop.rs:102:9
   8: std::thread::local::LocalKey<T>::try_with
             at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/std/src/thread/local.rs:399:16
   9: std::thread::local::LocalKey<T>::with
             at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/std/src/thread/local.rs:375:9
  10: tokio::coop::with_budget
             at /.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.21.0/src/coop.rs:95:5
  11: tokio::coop::budget
             at /.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.21.0/src/coop.rs:72:5
  12: tokio::park::thread::CachedParkThread::block_on
             at /.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.21.0/src/park/thread.rs:267:31
  13: tokio::runtime::enter::Enter::block_on
             at /.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.21.0/src/runtime/enter.rs:152:13
  14: tokio::runtime::scheduler::multi_thread::MultiThread::block_on
             at /.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.21.0/src/runtime/scheduler/multi_thread/mod.rs:79:9
  15: tokio::runtime::Runtime::block_on
             at /.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.21.0/src/runtime/mod.rs:492:44
  16: test_zip::main
             at ./src/main.rs:10:5
  17: core::ops::function::FnOnce::call_once
             at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/core/src/ops/function.rs:227:5

And the file I'm trying to send:

$ zipinfo with-zipfile.zip 
Archive:  with-zipfile.zip
Zip file size: 89746 bytes, number of entries: 2
drwxr-xr-x  2.0 unx        0 bx stor 22-Sep-17 15:50 with-zipfile/
-rw-r--r--  2.0 unx    89326 bX defN 22-Sep-17 15:50 with-zipfile/olives.zip
2 files, 89326 bytes uncompressed, 89356 bytes compressed:  0.0%

I did some experiments and it seems to happen with many file formats that use deflate (specifically in the "normal" mode), for example: epub or odt.

Why would that matter that the stream contains a file that is also compressed? The DecompressError suggests that the error is somewhere in the async-compression flate2 call, but I couldn't narrow down where and why exactly. Some help would be appreciated!

Local file header with no size information

When a data descriptor is used (primarily for streaming compression), async_zip currently does correctly handle CRC, but it does not do anything particular to the size information in local file headers, resulting in zero compressed & uncompressed size. This should be either documented, or preferably handled by the public interface so that ZipEntry::[un]compressed_size returns Option<u32> or similar.

[Question][Offtopic] Async Zip Compat

Hi Harry!

This crate is absolutely wonderful, it saved me over a year ago when I was too dumb to figure out async_compression, and your package (back at 0.0.7) wrapped it and did all the dirty work for me.

Now I just bumped the package to 0.0.15, to take advantage of some of the compression algorithms being behind feature flags, and improve compile times, but I noticed a couple things:

  1. it no longer wraps async_compression. Why not?
  2. It has introduced this Compat thing. Which I did not need before.

I was hoping you wouldn't explaining this Compat situation and why it is now needed. In the process I hope to gain a better understanding of the rust async ecosystem! Thank you!!

For reference, here is my function that uses your library, and the comments IN CAPS detail the changes i had to make to bump to 0.0.15

#[get("/download")]
pub async fn directories_download(
    db_pool: web::Data<DbPool>,
    blob_storage: web::Data<BlobStorage>,
    query: web::Query<DownloadDirectoryRequest>,
    id: Identity,
) -> StreamingResponse<ReaderStream<impl AsyncRead>> {
    tracing::Span::current().record("query", query.as_value());
    let user_id =
        require_user_login(id).map_err(|e| std::io::Error::new(std::io::ErrorKind::Other, e))?;

    // Prepare a stream that will receive the compressed bytes
    let (mut compressed_tx, compressed_rx) = tokio::io::duplex(1024);
    // I NEEDED TO `.compat()` THIS
    let compressed_tx = compressed_tx.compat();

    // Get a list of hashes and paths that we need to compress
    let files_to_zip = {
        let mut files = directories_get_children_hashes_and_paths_for_directory_download(
            db!(db_pool),
            user_id,
            query.directory_entry_id,
        )
        .await?;

        if query.deduplicate == Some(true) {
            deduplicate(&mut files)
        };

        files
    };

    tokio::spawn(async move {
        // Prepare our ZipFileWriter
        let mut zip_archive = ZipFileWriter::new(compressed_tx);

        for (hash, path) in files_to_zip {
            let (mut uncompressed_tx, mut uncompressed_rx) = tokio::io::duplex(1024);

            let mut entry_writer = zip_archive
                .write_entry_stream(ZipEntryBuilder::new(path.into(), Compression::Deflate))
                .await
                .expect("Couldn't create an EntryStreamWriter")
                // NEEDED `.compat_write()` HERE
                .compat_write();

            // Begin streaming into the channel
            let blob_storage = blob_storage.clone();
            tokio::spawn(async move {
                blob_storage
                    .retrieve_file_streaming(&hash, &mut uncompressed_tx)
                    .await
                    .expect("blob storage could not retrieve file");
            });

            // Copy from channel into the entry_writer
            tokio::io::copy(&mut uncompressed_rx, &mut entry_writer)
                .await
                .expect("couldn't copy unompressed bytes into the EntryStreamWriter");

            // // finalize this file's compression
            entry_writer
                // NEEDED TO GET THE `EntryStreamWriter` BACK OUT
                // TO BE ABLE TO `.close()` IT
                .into_inner()
                .close()
                .await
                .expect("couldn't shutdown the EntryStreamWriter");
        }

        // When all uncompressed_streams have completed we can close off
        // the ZipFileWriter
        zip_archive
            .close()
            .await
            .expect("couldn't close the zip file");
    });

    Ok(StreamingBody(ReaderStream::new(compressed_rx)))
}
`

Result unwrap in ZipEntryReader::copy_to_end_crc

The ZipEntryReader::copy_to_end_crc method panics when working with a specific zip file. I don't know whether the file is standards-compliant. If the cause of the bug is that it isn't, it should be expressed as a returned error instead of a panic.

Steps to reproduce

Cargo.toml:

[package]
name = "async-zip-unwrap-min"
version = "0.1.0"
edition = "2021"

[dependencies]
async_zip = "=0.0.3"

[dependencies.tokio]
version = "=1.14.0"
features = ["macros", "rt-multi-thread"]

main.rs:

use {
    async_zip::read::stream::ZipFileReader,
    tokio::{
        fs::File,
        io,
    },
};

#[tokio::main]
async fn main() {
    let mut zip_file = File::open("BizHawk-2.7-win-x64.zip").await.expect("panic not happening here");
    let mut zip_file = ZipFileReader::new(&mut zip_file);
    while let Some(entry) = zip_file.entry_reader().await.expect("panic not happening here") {
        entry.copy_to_end_crc(&mut io::sink(), 64 * 1024).await.expect("panic not happening here");
    }
}

The file BizHawk-2.7-win-x64.zip is taken from https://github.com/TASEmulators/BizHawk/releases/tag/2.7.

Output:

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Custom { kind: Other, error: DecompressError(General { msg: None }) }', C:\Users\fenhl\.cargo\registry\src\github.com-1ecc6299db9ec823\async_zip-0.0.3\src\read\mod.rs:176:56
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread 'main' panicked at 'Not all bytes of this reader were consumed before being dropped.', C:\Users\fenhl\.cargo\registry\src\github.com-1ecc6299db9ec823\async_zip-0.0.3\src\read\mod.rs:208:13
stack backtrace:

(stack backtrace for the 2nd panic omitted since that's not the bug being discussed here)

make filename an owned type

I'm finding is a bit hard to use this library when I need to parse more than a single zip given a single &'static str value. This library seems to work great but only the the case were you have a hard coded filename. As soon as you try to read input from a user interface you run into lifetime errors. not being able to create a &'static str from owned String types.

This would go away if filename were and owned type. Though I could be missing something. If it is possible today can you share some examples were collecting a list of paths of even a single path from user input is possible?

Error reading archive

Unable to read an archive using the code

    let mut file = tokio::fs::File::open(zip_file).await.unwrap();
    let mut zip = ZipFileReader::new(&mut file).await.unwrap();

    let entry = zip.file().entries().get(0).unwrap().clone();
    let mut string = String::new();
    let mut reader = zip.entry(0).await.unwrap();
    let txt = reader.read_to_string_checked(&mut string, entry.entry()).await.unwrap();
    println!("{}", txt);

Error is

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: UpstreamReadError(Custom { kind: InvalidData, error: "stream did not contain valid UTF-8" })'

This happens on a simple zip file with two files (a.txt and b.txt) having content

01234567890123456789012345678901234567890123456789

Extracting the zip file without crc check shows the content of a.txt as

45678901234567890123456789
PK��
�����XC0V�4�^3���3�

Looks like a regression in v0.0.10. Works correctly in v0.0.9

Inefficient/incorrect approach to finding the zip64 EOCD locator

The current code first searches for the EOCDR for the zip file and then searches for a zip64 EOCD locator:

// First find and parse the EOCDR.
log::debug!("Locate EOCDR");
let eocdr_offset = crate::read::io::locator::eocdr(&mut reader).await?;
reader.seek(SeekFrom::Start(eocdr_offset)).await?;
let eocdr = EndOfCentralDirectoryHeader::from_reader(&mut reader).await?;
log::debug!("EOCDR: {:?}", eocdr);
let comment = crate::read::io::read_string(&mut reader, eocdr.file_comm_length.into()).await?;
// Find the Zip64 End Of Central Directory Locator. If it exists we are now officially in zip64 mode
let zip64_eocdl_offset = io::locator::zip64_eocdl(&mut reader).await?;
let zip64 = zip64_eocdl_offset.is_some();
let zip64_eocdr = if let Some(locator_offset) = zip64_eocdl_offset {

As far as I can tell, this repeated search is unnecessary because the zip64 EOCD locator is of a fixed size and immediately precedes the EOCDR:

   4.3.6 Overall .ZIP file format:

      [local file header 1]
      [encryption header 1]
      [file data 1]
      [data descriptor 1]
      . 
      .
      .
      [local file header n]
      [encryption header n]
      [file data n]
      [data descriptor n]
      [archive decryption header] 
      [archive extra data record] 
      [central directory header 1]
      .
      .
      .
      [central directory header n]
      [zip64 end of central directory record]
      [zip64 end of central directory locator] 
      [end of central directory record]

I think that a better approach may be to:

  1. Find the EOCDR as normal (using the existing search code)
  2. Look at the 20 bytes immediately before the EOCDR. If the signature matches the zip64 EOCD locator signature, then that's our locator. Otherwise the zip file is not a zip64

The current implementation has a few issues:

  1. If the zip64 EOCD locator signature is in the comment at the end of the EOCDR, that can cause incorrect parsing
  2. If the EOCDR has the largest possible comment, the zip64 EOCD locator will not be found even if it exists (because of the way the searches are implemented)
  3. Both of the searches (EOCDR and zip64 EOCD locator) are implemented on top of locate_record_by_signature which searches up to the last ~64kb of a zip file. This can cause a repeated unnecessary search of the last 64kb which can be a problem if reads to the underlying reader are expensive.

cc @skairunner

Feature request: support reading archives with utf8-incompatible metadata

Motivation

Although the standard has supported UTF8 since 2006, many legacy software still create archives with utf8-incompatible metadata. In particular, the Windows Explorer's built-in ZIP archiver encodes filenames with encodings defined by the system locale setting. (For example, CP437 for English, CP936/GBK for Simplified Chinese, CP950/Big5 for Traditional Chinese. This is still the case on Windows 11.)

A ZIP library should be expected to read and extract archives with utf8-incompatible metadata.

Current Status

ZipFileReader as of version 0.0.11 returns UpstreamReadError: stream did not contain valid UTF-8 if the archive has utf8-incompatible metadata.

let filename = crate::read::io::read_string(&mut reader, header.file_name_length.into()).await?;
let compression = Compression::try_from(header.compression)?;
let extra_field = crate::read::io::read_bytes(&mut reader, header.extra_field_length.into()).await?;
let extra_fields = parse_extra_fields(extra_field)?;
let comment = crate::read::io::read_string(reader, header.file_comment_length.into()).await?;

Expected behavior

  1. ZipFileReader should be able to read archives with utf8-incompatible metadata.
  2. If an entry's UTF8 flag is set, ZipFileReader should try to parse its metadata as Rust string.
  3. If an entry's UTF8 flag is not set, ZipFileReader should either return the raw bytes or try to parse the metadata with a default encoding. In the latter case, it might be useful to also return the raw bytes allowing the caller to try a different encoding.

Other ZIP libraries

zip-rs

zip-rs performs a lossy conversion if the entry has a UTF8 flag. Else, it tries to parse the metadata with CP437.

https://github.com/zip-rs/zip/blob/0dcc40bee0179d9e841622f6c1a2217173b69951/src/read.rs#L683-L690

Python

zipfile module in Python's standard library has the same behavior as zip-rs, but allows the caller to provide an optional encoding to replace the default CP437.

https://github.com/python/cpython/blob/88a1e6db0ff7856191a5d63d3d26a896f3ae5885/Lib/zipfile.py#L1401-L1406

Test Archives

The attached archives contain an empty file with the same name. One is encoded with GBK, the other with UTF8.
gbk.zip
utf8.zip

List of caveats for non-seekable streams

It may be a good idea to add a few more notes to your list of caveats about decompressing from a non-seekable stream.

From Wikipedia:

Because ZIP files may be appended to, only files specified in the central directory at the end of the file are valid. Scanning a ZIP file for local file headers is invalid (except in the case of corrupted archives), as the central directory may declare that some files have been deleted and other files have been updated.

Maybe consider adding a note about deleted and updated files to the list here:

//! # Considerations
//! As the central directory of a ZIP archive is stored at the end of it, a non-seekable reader doesn't have access
//! to it. We have to rely on information provided within the local file header which may not be accurate or complete.
//! This results in:
//! - No file comment being avaliable (defaults to an empty string).
//! - No internal or external file attributes being avaliable (defaults to 0).
//! - The extra field data potentially being inconsistent with what's stored in the central directory.
//! - None of the following being avaliable when the entry was written with a data descriptor (defaults to 0):
//! - CRC
//! - compressed size
//! - uncompressed size
//!

Bug: Unicode broken?

Hi!

This minimal example:

// [dependencies]
// async_zip = "0.0.6"
// tokio = { version = "1" features = ["full"] }

use async_zip::write::{EntryOptions, ZipFileWriter};
use async_zip::Compression;
use tokio::io::AsyncWriteExt;

#[tokio::main]
async fn main() {
    let mut output_file = tokio::fs::File::create("ö.zip").await.unwrap();
    let mut output_writer = ZipFileWriter::new(&mut output_file);

    let filename = "öäöääö.txt".to_string();
    let entry_options = EntryOptions::new(filename, Compression::Stored);
    let mut entry_writer = output_writer
        .write_entry_stream(entry_options)
        .await
        .unwrap();

    let data = "hello öäöääö".to_string();
    tokio::io::copy(&mut data.as_bytes(), &mut entry_writer)
        .await
        .unwrap();

    entry_writer.close().await.unwrap();
    output_writer.close().await.unwrap();
    output_file.shutdown().await.unwrap();
}

Will yield ö.zip but öäöääö.txt's filename will be broken. The contents are correct.

Using the regular zip crate:

// [dependencies]
// zip = "0.5.13"

use std::io::Write;
use zip::write::FileOptions;

fn main() {
    let path = std::path::Path::new("ö.zip");
    let file = std::fs::File::create(&path).unwrap();

    let mut zip = zip::ZipWriter::new(file);

    let options = FileOptions::default()
        .compression_method(zip::CompressionMethod::Stored)
        .unix_permissions(0o755);
    zip.start_file("öäöääö.txt", options).unwrap();
    zip.write_all("hello öäöääö".to_string().as_bytes())
        .unwrap();

    zip.finish().unwrap();
}

Does not produce any problems

Stack overflow

Hello, I'm trying to use your lib for the first time and just using the basic examples in your docs for reading. I'm getting:

thread 'main' has overflowed its stack
error: process didn't exit successfully: `target\debug\scprospector.exe` (exit code: 0xc00000fd, STATUS_STACK_OVERFLOW)

The zip file was created by 7zip for windows and is attached.
test.zip

How can I get this working?

main.rs

use scprospector::print_p4k_contents;

#[tokio::main]
async fn main() -> Result<(), anyhow::Error> {
    print_p4k_contents().await?;
    Ok(())
}

lib.rs

use async_zip::read::seek::ZipFileReader;
use tokio::fs::File;

pub async fn print_p4k_contents() -> Result<(), anyhow::Error> {
    let mut file = File::open("scprospector.zip").await?;
    dbg!(&file);
    println!("file open; trying to open as zip");
    let mut zip = ZipFileReader::new(&mut file).await?;

    println!("zip open; trying to read first entry");
    let reader = zip.entry_reader(0).await?;
    println!("first entry read; getting crc");
    let txt = reader.read_to_string_crc().await?;

    println!("{}", txt);
    Ok(())
}

It crashes at line let mut zip = ZipFileReader::new(&mut file).await?;.

ZipFileWriter not compatible with some types implementing AsyncWrite

First of all, thanks for the great crate!!

I'm currently trying to use this crate to write a zip compressed stream to a tokio::DuplexStream. This currently fails because DuplexStream implements poll_shutdown by not allowing any more writes to the stream. This is problematic as calling close on the ZipFileWriter calls shutdown() to finalize the compression encoder for each file added to the archive, which means that writes after the first call to close all fail.

I don't think DuplexStream is wrong here to not allow writes after being shutdown, the Tokio documentation for Shutdown() on the AsyncWrite trait mentions that after shutdown succeeds, "the I/O object itself is likely no longer usable." (https://docs.rs/tokio-io/0.1.13/tokio_io/trait.AsyncWrite.html#return-value)

From what I can tell, Shutdown is called here so that the async_compression encoder can finalize the encoding. I guess this is mostly OK for that use case (though I see someone has raised an issue about adding another way of finalizing - Nullus157/async-compression#141) but in this case it seems like it will stop this library working at all.

I guess it's fortunate that the tokio::File and tokio::Cursor<Vec<u8>> implementations of shutdown just call flush (https://github.com/tokio-rs/tokio/blob/702d6dccc948b6a460caa52da4e4c03b252c5c3b/tokio/src/fs/file.rs#L702 and https://github.com/tokio-rs/tokio/blob/702d6dccc948b6a460caa52da4e4c03b252c5c3b/tokio/src/io/async_write.rs#L376) so it's not so much of an issue for those two writers (at least for now, maybe the implementations will change in the future to actually shut down?).

For the time being I've worked around this by wrapping DuplexStream to do a flush instead of a shutdown, but it's not ideal:

struct DuplexStreamWrapper(DuplexStream);
impl AsyncWrite for DuplexStreamWrapper {
    fn poll_write(
        mut self: Pin<&mut Self>,
        cx: &mut std::task::Context<'_>,
        buf: &[u8],
    ) -> std::task::Poll<Result<usize, io::Error>> {
       Pin::new(&mut self.0).poll_write(cx, buf)
    }

    fn poll_flush(mut self: Pin<&mut Self>, cx: &mut std::task::Context<'_>) -> std::task::Poll<Result<(), io::Error>> {
        Pin::new(&mut self.0).poll_flush(cx)
    }

    fn poll_shutdown(mut self: Pin<&mut Self>, cx: &mut std::task::Context<'_>) -> std::task::Poll<Result<(), io::Error>> {
        Pin::new(&mut self.0).poll_flush(cx)
    }
}

Would be happy to help out with a fix but thought I'd raise this before trying anything in case there's a better way of going about what I'm trying to do. I'm not sure if this problem should/could be fixed/worked-around in this crate, or if a change needs to be implemented in async_compression to allow finalizing the encoder without shutting down the writer first.

How to construct ZipDateTime without chrono dependency?

Hi,

Thanks for the library! I want to construct a ZipDateTime but do not want to use chrono as I already use time directly.

I am porting from synchronous code that uses zip and previously I could use this function but I can't see an equivalent in ZipDateTime.

Am I missing something?

Thanks 🙏

deflate64 support

Windows may creates zip file with deflate64 compression method but this library doesn't support deflate64 so decompressing such zip file.
I want to decompress such a zip file so I want rs-async-zip to support deflate64.

Depends on Nullus157/async-compression#237

Feature: data descriptors in streaming mode timeline

cargo-binstall just experienced one failure due to this not supported in streaming mode.

The error message says it will get supported soon, so I wonder if there is a timeline for this, i.e. if it's going to be supported in the next release and how long it will take.

If it will take quite some time for it to be supported, we might switch to file-based API.

This is just an issue asking for the timeline, not asking for a new feature, @Majored please don't feel pressured.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.