Giter Club home page Giter Club logo

Comments (8)

dragly avatar dragly commented on August 22, 2024 2

As mentioned in #26 (comment) the issue can also be reproduced by compressing the tests/files/range-coder-edge-case file with options set to write the unpacked size to the header and then decompress it:

use lzma_rs;
use std::io::prelude::*;

fn main() {
    let mut x = Vec::new();
    std::fs::File::open("tests/files/range-coder-edge-case")
        .unwrap()
        .read_to_end(&mut x)
        .unwrap();

    let encode_options = lzma_rs::compress::Options {
        unpacked_size: lzma_rs::compress::UnpackedSize::WriteToHeader(Some(x.len() as u64)),
    };
    let decode_options = lzma_rs::decompress::Options {
        unpacked_size: lzma_rs::decompress::UnpackedSize::ReadFromHeader,
    };
    let mut compressed: Vec<u8> = Vec::new();
    lzma_rs::lzma_compress_with_options(
        &mut std::io::BufReader::new(x.as_slice()),
        &mut compressed,
        &encode_options,
    )
    .unwrap();
    let mut bf = std::io::BufReader::new(compressed.as_slice());
    let mut decomp: Vec<u8> = Vec::new();
    lzma_rs::lzma_decompress_with_options(&mut bf, &mut decomp, &decode_options).unwrap();
}

from lzma-rs.

gendx avatar gendx commented on August 22, 2024

An LZMA stream can include an unpacked_size hint in its header (https://github.com/gendx/lzma-rs/blob/master/src/decode/lzma.rs#L61-L74), which the code then verifies to reject inconsistencies (https://github.com/gendx/lzma-rs/blob/master/src/decode/lzma.rs#L312-L320).

Additionally, the LZMA2 format is a wrapper around LZMA, which can also provide an unpacked size hint on top of it (https://github.com/gendx/lzma-rs/blob/master/src/decode/lzma2.rs#L89-L95 and https://github.com/gendx/lzma-rs/blob/master/src/decode/lzma2.rs#L161).

On top of that, XZ compresses each file with an LZMA2 stream.

So it looks like either your file was corrupted or there is a bug in my code due to a corner case that I didn't see before.

  • Can you comment out the error check (https://github.com/gendx/lzma-rs/blob/master/src/decode/lzma.rs#L312-L320) and let me know if decompression works for your file?
  • Do you know which software created this archive?
  • Can you run your code with an environment variable set to RUST_LOG=lzma-rs=info, so that I can get a clearer idea of what is going on?
  • If the file is publicly available (or if you can reproduce the issue on a publicly available file), can you point it to me so that I can debug further?

from lzma-rs.

gendx avatar gendx commented on August 22, 2024

Would #17 (or a variant of it) work for this use case?

from lzma-rs.

ibaryshnikov avatar ibaryshnikov commented on August 22, 2024

@gendx I created a reproduction, will it help?

use std::io::BufReader;
use lzma_rs::decompress::{Options, UnpackedSize};

const DATA: &[u8] = &[
    93, 0, 0, 1, 0, 0, 0, 111, 253, 255, 255, 163, 183, 255, 71, 62, 72, 21, 114, 57, 97, 81, 184,
    146, 40, 230, 143, 221, 66, 251, 179, 253, 113, 133, 36, 209, 157, 136, 6, 166, 184, 144, 144,
    180, 72, 27, 108, 146, 211, 153, 161, 58, 255, 52, 129, 75, 240, 91, 145, 234, 14, 20, 173, 77,
    167, 21, 218, 124, 215, 37, 87, 175, 123, 84, 42, 90, 42, 15, 40, 156, 200, 228, 82, 146, 100,
    78, 137, 120, 145, 121, 117, 60, 144, 172, 178, 50, 13, 116, 246, 17, 195, 181, 90, 136, 248,
    128, 160, 103, 203, 131, 61, 101, 79, 13, 188, 166, 86, 177, 61, 29, 24, 147, 226, 211, 42, 16,
    116, 153, 103, 9, 17, 112, 188, 159, 117, 114, 125, 209, 157, 150, 224, 44, 197, 39, 232, 193,
    190, 15, 0, 4, 130, 28, 84, 73, 91, 189, 120, 8, 69, 78, 165, 182, 187, 252, 105, 241, 61, 199,
    210, 26, 194, 15, 70, 225, 186, 144, 150, 195, 46, 150, 103, 144, 224, 196, 136, 25, 140, 45,
    169, 29, 100, 201, 225, 234, 59, 16, 254, 147, 168, 89, 240, 42, 238, 251, 69, 135, 217, 29,
    243, 218, 10, 172, 191, 192, 95, 186, 36, 117, 158, 138, 110, 8, 207, 141, 154, 9, 159, 181, 3,
    71, 95, 111, 99, 247, 247, 33, 89, 114, 7, 61, 46, 250, 138, 21, 2, 105, 135, 90, 83, 215, 223,
    60, 180, 69, 243, 112, 226, 228, 100, 144, 11, 167, 204, 83, 148, 112, 122, 31, 30, 71, 230,
    64, 211, 22, 193, 147, 121, 76, 180, 3, 79, 198, 164, 40, 176, 206, 62, 34, 200, 114, 9, 81,
    33, 129, 115, 94, 77, 166, 124, 38, 148, 20, 62, 133, 46, 21, 63, 37, 112, 202, 221, 26, 34, 4,
    13, 189, 74, 75, 162, 189, 241, 123, 154, 163, 59, 7, 148, 203, 156, 18, 125, 126, 147, 209,
    158, 105, 231, 27, 203, 191, 132, 50, 146, 226, 22, 201, 251, 40, 255, 101, 201, 255, 75, 201,
    60, 5, 36, 246, 121, 87, 144, 239, 19, 138, 52, 229, 23, 193, 207, 4, 113, 151, 154, 147, 223,
    52, 140, 114, 174, 146, 90, 0, 42, 38, 113, 62, 58, 164, 224, 122, 82, 205, 66, 43, 153, 64,
    134, 64, 140, 123, 119, 237, 154, 159, 175, 94, 254, 119, 160, 234, 217, 50, 124, 84, 137, 204,
    160, 36, 83, 32, 91, 171, 136, 100, 221, 214, 36, 161, 168, 31, 105, 199, 188, 91, 14, 248, 37,
    175, 98, 22, 164, 68, 234, 76, 175, 144, 32, 39, 10, 60, 201, 181, 100, 52, 184, 202, 194, 77,
    159, 147, 177, 98, 172, 139, 31, 185, 230, 46, 171, 105, 55, 106, 24, 254, 236, 255, 110, 189,
    247, 139, 213, 200, 241, 113, 20, 28, 232, 144, 194, 54, 188, 180, 193, 196, 73, 234, 60, 111,
    87, 228, 113, 186, 65, 174, 66, 219, 80, 167, 249, 36, 43, 57, 144, 101, 25, 188, 250, 28, 217,
    2, 203, 195, 217, 6, 52, 125, 206, 106, 211, 148, 190, 119, 126, 34, 100, 117, 218, 183, 135,
    108, 77, 244, 54, 116, 167, 24, 113, 104, 211, 29, 14, 143, 255, 124, 241, 74, 135, 140, 131,
    196, 245, 234, 245, 213, 189, 35, 139, 127, 212, 247, 0,
];
const PACKED_SIZE: u64 = 566;
const UNPACKED_SIZE: u64 = 5048;

fn main() {
    let mut input = BufReader::new(DATA);
    let mut output = vec![];
    let options = Options {
        unpacked_size: UnpackedSize::UseProvided(Some(UNPACKED_SIZE)),
    };
    let result = lzma_rs::lzma_decompress_with_options(&mut input, &mut output, &options);
    println!("The result is {:?}", result);
}

It prints: "Expected unpacked size of 5048 but decompressed to 5046".
Packed size is 566 and 5 additional bytes are props.

from lzma-rs.

gendx avatar gendx commented on August 22, 2024

Thanks @ibaryshnikov for your example.

However, I don't see how it's not behaving as expected. You provide an expected unpacked size of 5048 bytes, but the decompressed output is only 5046 bytes. When I set the expected size to 5046 your example stream decompresses fine.

So to me this works as intended - if the decompressed size doesn't match the expected one you provided, an error should be reported instead of returning any partial and/or potentially corrupted result. If you don't know the expected size, you can use UnpackedSize::ReadFromHeader (the default decoding option) - as long as the stream header provides it - or UnpackedSize::UseProvided(None).

from lzma-rs.

ibaryshnikov avatar ibaryshnikov commented on August 22, 2024

@gendx thanks for checking this example. It's a bit tricky to check when the input is ended. We can have one code, and iterate several times over it using different ranges. In my example, the code before the last is 1063818487, and we have two different valid ranges for it, first is 2663792640 and second is 1320009537. Then there's a switch to the last code, which is 0. Again, we can iterate over this code using different ranges. After removing the break on

pub fn is_finished_ok(&mut self) -> io::Result<bool> {
    Ok(self.code == 0 && util::is_eof(self.stream)?)
}

I got three ranges for code 0: 2212886016, 1089365498 and 547851036 (before there was only 2212886016). That's how we can find the last two bytes, and have 5048 in total. I've compared the results with the library from another language and it seems correct.

I don't think it's related to the original issue where the difference between unpacked size is quite solid (149198 vs 483334), but It may be a separate issue. @gendx what do you think?

from lzma-rs.

dragly avatar dragly commented on August 22, 2024

We are seeing the same issue, although with a very tiny difference:

LZMAError Expected unpacked size of 116412 but decompressed to 116411"

Unfortunately, it is again in a file that I cannot share. I have also so far been unable to reproduce the issue with other files.

However, the fix in #26 works for us as well.

from lzma-rs.

antonsmetanin avatar antonsmetanin commented on August 22, 2024

I'm having the same issue with this file:
http://beta.unity3d.com/download/d691e07d38ef/LinuxEditorInstaller/Unity.tar.xz

fn main() {
    let mut file = std::io::BufReader::new(std::fs::File::open("Unity.tar.xz").unwrap());
    let mut decomp: Vec<u8> = Vec::new();

    lzma_rs::xz_decompress(&mut file, &mut decomp).unwrap();
}

This code produces the following error:

ZMAError("Expected unpacked size of 153357 but decompressed to 779954")

from lzma-rs.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.