Giter Club home page Giter Club logo

Comments (15)

akalin avatar akalin commented on June 20, 2024

Can you pass in the path to the .par2 file directly, e.g. ~/go/bin/par v /mnt/casa/storage/1/Full-0005.par2? Currently the par1 vs. par2 detection is very dumb and will treat it as par1 unless the file ends in .par2 (which is what's causing the error).

I'd be curious to see if gopar runs into a similar problem as par2cmdline did!

from gopar.

apalazzi avatar apalazzi commented on June 20, 2024

I've run a test and the result is the same as with par2cmdline: the file is not repaired and, to be more precise, the "repaired" file is the same as the original Full-0005 file.

Here is the (cut down) log of the repair operation:

Loaded file description packet for "Full-0005" (ID=0be49d13888f6c69ea09b2307d58f0dd, 19327343417 bytes)
Loaded checksums for file with ID 0be49d13888f6c69ea09b2307d58f0dd
Loaded main packet: slice byte count=9663672, recovery set size=1, non-recovery set size=0
Loaded creator packet with client ID "Created by par2cmdline version 0.7.4."
Hash mismatch for "Full-0005" (ID 0be49d13888f6c69ea09b2307d58f0dd)
[1/1] Loaded data file "Full-0005" (19327343417 bytes, 1999 hits, 9663672 misses)
Corrupt data chunk: "Full-0005" (ID 0be49d13888f6c69ea09b2307d58f0dd), bytes 18148376016 to 18158039687
Loaded recovery packet: exponent=0, byte count=9663672
[1] Loaded volume file "Full-0005.vol000+001.par2"
Loaded recovery packet: exponent=1, byte count=9663672
Loaded recovery packet: exponent=2, byte count=9663672
[2] Loaded volume file "Full-0005.vol001+002.par2"
Loaded recovery packet: exponent=3, byte count=9663672
Loaded recovery packet: exponent=4, byte count=9663672
Loaded recovery packet: exponent=5, byte count=9663672
Loaded recovery packet: exponent=6, byte count=9663672
[3] Loaded volume file "Full-0005.vol003+004.par2"
Loaded recovery packet: exponent=7, byte count=9663672
Loaded recovery packet: exponent=8, byte count=9663672
Loaded recovery packet: exponent=9, byte count=9663672
Loaded recovery packet: exponent=10, byte count=9663672
Loaded recovery packet: exponent=11, byte count=9663672
[.....]
Loaded recovery packet: exponent=105, byte count=9663672
Loaded recovery packet: exponent=106, byte count=9663672
Loaded recovery packet: exponent=107, byte count=9663672
Loaded recovery packet: exponent=108, byte count=9663672
Loaded recovery packet: exponent=109, byte count=9663672
Loaded recovery packet: exponent=110, byte count=9663672

Is the log supposed to end like this, without a message of "repair terminated" or something?

from gopar.

akalin avatar akalin commented on June 20, 2024

Hmm, the I/O for gopar is quite unoptimized, since it just loads everything into memory, so it's entirely possible it's just running out of memory and getting killed. I'll take a closer look...

from gopar.

akalin avatar akalin commented on June 20, 2024

One thing you can try: it seems it's running out of memory when trying to load all of the recovery packets, but you only need one. If you move all but one or two recovery file out of the way, perhaps it can make further progress?

from gopar.

apalazzi avatar apalazzi commented on June 20, 2024

I noticed that it needs a lot of memory ;) but if it runs out of memory usually either the process is killed and the system reports it, or the system hangs. I'll try with only one file and see what happens.

from gopar.

apalazzi avatar apalazzi commented on June 20, 2024

Here are the results of the test with only one file:

Loaded file description packet for "Full-0005" (ID=0be49d13888f6c69ea09b2307d58f0dd, 19327343417 bytes)
Loaded checksums for file with ID 0be49d13888f6c69ea09b2307d58f0dd
Loaded main packet: slice byte count=9663672, recovery set size=1, non-recovery set size=0
Loaded creator packet with client ID "Created by par2cmdline version 0.7.4."
Hash mismatch for "Full-0005" (ID 0be49d13888f6c69ea09b2307d58f0dd)
[1/1] Loaded data file "Full-0005" (19327343417 bytes, 1999 hits, 9663672 misses)
Corrupt data chunk: "Full-0005" (ID 0be49d13888f6c69ea09b2307d58f0dd), bytes 18148376016 to 18158039687
Loaded recovery packet: exponent=3, byte count=9663672
Loaded recovery packet: exponent=4, byte count=9663672
Loaded recovery packet: exponent=5, byte count=9663672
Loaded recovery packet: exponent=6, byte count=9663672
[1] Loaded volume file "Full-0005.vol003+004.par2"
Repair error: hash mismatch in reconstructed data

I've also tried with vol000+001 and vol001+2 with the same result.
Does that tells you something?

from gopar.

akalin avatar akalin commented on June 20, 2024

Hunh, interesting. So par2 stores the md5 hash of every file in the set, and that error is returned whenever the md5 of the reconstructed file still doesn't match the stored hash. This seems consistent with the behavior of par2cmdline in the other bug.

In this case, gopar refuses to write out the reconstructed file, whereas it looks like par2cmdline writes it out anyway.

I could perhaps add more output saying what the expected vs. actual hashes are, and also maybe add a switch to force writing out the reconstructed file even if the hashes don't match. Then you can perhaps compare the output of gopar vs. the output of par2cmdline to see if they match (which I expect), and that would mostly rule out a bug in par2cmdline's reconstruction. What do you think?

One more question: did you run gopar on the original corrupted file, or on the output of par2cmdline's reconstruction?

from gopar.

apalazzi avatar apalazzi commented on June 20, 2024

Yes, more output is welcome :) also, if I understand correctly, gopar overwrites the original file with the corrected data: maybe you could make it write a completely new file instead, that way we can rule out any filesystem issue in overwriting the file.

PS: the original file is identical to the one reconstructed by par2cmdline (same md5sum).

from gopar.

akalin avatar akalin commented on June 20, 2024

Ok, I added some more output -- I put it in the issue5-debug branch, so please compile from that. I also made it so it writes files to <filename>.recovered.

Oh, so par2cmdline wrote out a new repaired file, but it's identical to the original file? Interesting...one theory that came to mind is this sequence of events during the creation of the par2 files:

  1. par2cmdline scans Full-0005 to compute its MD5 hash and the other block checksums that par2 computes.
  2. Something else comes along and modifies the block between bytes 18148376016 to 18158039687.
  3. par2cmdline computes the recovery data, but using the modified file.
  4. par2cmdline writes out the par2 data with the old hashes but the new recovery data.

Then any subsequent run of par2cmdline/gopar would detect the corrupted block, but recovery would simply reconstruct the corrupted block.

Anyway, that's just a theory, but it seems to fit with all the observed behaviors. Unfortunately, if that's the explanation I don't think there's any practical way to recover the original data without knowing the structure of the file. :( I'll sleep on it and see if I can come up with another explanation that's more hopeful...

from gopar.

apalazzi avatar apalazzi commented on June 20, 2024

par2cmdline overwrites the original file, but in the end the repaired file is not changed. I'm not sure I understood your theory. Note also that this is not the only file where recovery fails.

Anyway, here is the results: please check also the preliminary steps to confirm I'm correctly running all the modified sources:

andrea@marcopolo:~/go/src/github.com/akalin/gopar$ git pull
warning: L'esecuzione di un pull senza specificare come riconciliare branch
divergenti non รจ consigliata. รˆ possibile sopprimere questo messaggio
eseguendo uno dei seguenti comandi prima di eseguire il prossimo pull:

  git config pull.rebase false  # merge (strategia predefinita)
  git config pull.rebase true   # rebase
  git config pull.ff only       # esegui solo fast forward

Puoi sostituire "git config" con "git config --global" per impostare una
preferenza predefinita per tutti i repository. Puoi anche passare gli
argomenti --rebase, --no-rebase o --ff-only sulla riga di comando per
eseguire l'override del valore predefinito configurato per una singola
invocazione.

remote: Enumerating objects: 28, done.
remote: Counting objects: 100% (28/28), done.
remote: Compressing objects: 100% (12/12), done.
remote: Total 28 (delta 12), reused 22 (delta 12), pack-reused 0
Decompressione degli oggetti in corso: 100% (28/28), 22.64 KiB | 526.00 KiB/s, fatto.
Da https://github.com/akalin/gopar
   93c63e8..e638488  master       -> origin/master
 * [nuovo branch]    issue5-debug -> origin/issue5-debug
Aggiornamento di 93c63e8..e638488
Fast-forward
 cmd/par/main.go      | 59 ++++++++++++++++++++++++++++++++++++++---------------------
 par1/decoder.go      | 31 ++++++++++++-------------------
 par1/decoder_test.go | 24 ++++++++++++------------
 par1/encoder_test.go |  4 ++--
 par2/decoder.go      | 57 ++++++++++++++++++++++-----------------------------------
 par2/decoder_test.go | 24 ++++++++++++------------
 par2/encoder_test.go |  4 ++--
 rsec16/coder.go      | 46 ++++++++++++++++++++++++++++++++++++++++------
 rsec16/coder_test.go | 28 +++++++++++++++++++++-------
 9 files changed, 161 insertions(+), 116 deletions(-)
andrea@marcopolo:~/go/src/github.com/akalin/gopar$ git checkout issue5-debug
Branch 'issue5-debug' impostato per tracciare il branch remoto 'issue5-debug' da 'origin'.
Si รจ passati a un nuovo branch 'issue5-debug'

andrea@marcopolo:/mnt/casa/bareos-recover/gopar$ go run github.com/akalin/gopar/cmd/par r Full-0005.par2 
Loaded file description packet for "Full-0005" (ID=0be49d13888f6c69ea09b2307d58f0dd, 19327343417 bytes)
Loaded checksums for file with ID 0be49d13888f6c69ea09b2307d58f0dd
Loaded main packet: slice byte count=9663672, recovery set size=1, non-recovery set size=0
Loaded creator packet with client ID "Created by par2cmdline version 0.7.4."
Hash mismatch for "Full-0005" (ID 0be49d13888f6c69ea09b2307d58f0dd): 16k hash matches (868ce610521d5d69d1d26c664cc5d366) but expected hash (dc31cf125b1fd387dcb49bd86444bd84) doesn't match actual hash (a1ceb53101e86333edc69c8e057f5f9e)
[1/1] Loaded data file "Full-0005" (19327343417 bytes, 1999 hits, 9663672 misses)
Corrupt data chunk: "Full-0005" (ID 0be49d13888f6c69ea09b2307d58f0dd), bytes 18148376016 to 18158039687
Loaded recovery packet: exponent=3, byte count=9663672
Loaded recovery packet: exponent=4, byte count=9663672
Loaded recovery packet: exponent=5, byte count=9663672
Loaded recovery packet: exponent=6, byte count=9663672
[1] Loaded volume file "Full-0005.vol003+004.par2"
DEBUG: hash mismatch error for file Full-0005, 16k hash matches (868ce610521d5d69d1d26c664cc5d366) but expected hash (dc31cf125b1fd387dcb49bd86444bd84) doesn't match actual hash (27b03b2874e1456da3745bc38c7bf772)[1/1] Wrote data file "Full-0005.recovered" (19327343417 bytes)
Repaired files: [Full-0005]

andrea@marcopolo:/mnt/casa/bareos-recover/gopar$ md5sum Full-0005
a1ceb53101e86333edc69c8e057f5f9e  Full-0005
andrea@marcopolo:/mnt/casa/bareos-recover/gopar$ md5sum Full-0005.recovered 
27b03b2874e1456da3745bc38c7bf772  Full-0005.recovered

from gopar.

mdnahas avatar mdnahas commented on June 20, 2024

Hi akalin, I'm Mike Nahas, the designer of the PAR2 spec.

So, apallazi, I think what akalin was saying is: Is it possible that files were changing while generating the original par2 files? If they were, different versions of the input files might have been read at different times. The file data is used (1) to generate md5sums for each file and (2) to generate the recovery data. If those reads were different, the recovery data may be generating a file that doesn't match the md5sum for it.

BTW, if the data was compressed, most compression/archiving programs (ZIP, TAR, etc.) can still regenerate most of the files in them. So, hopefully, you haven't lost all your data. (I know that is small comfort after you used PAR2 just to avoid that issue. Sorry.)

Mike

from gopar.

akalin avatar akalin commented on June 20, 2024

@apalazzi that looks good. But yeah, unless you have any other ideas for things to try, the current theory of Full-0005 being modified during par2 generation seems most likely. Out of curiosity, what format is Full-0005? I wonder if it has some sort of integrity checking itself...

@mdnahas one weirdness is that gopar writes out a new file with a different md5, whereas it sounds like par2cmdline writes out a repaired file that's the same as the original one. Is that expected? Does par2cmdline do something different if it detects that the recovered file doesn't match the hash?

from gopar.

apalazzi avatar apalazzi commented on June 20, 2024

@mdnahas Don't worry, the data should be safe (but see below), I was just double-checking if everything was ok with the backup.
@akalin the file is a bareos backup file, it has some integrity check but not date recovery capabilities.

I've recently discovered that one of the ram modules was bad; even though the backup dates back to 2017, if the ram was already bad at the time that would explain the issue. If this is the case I guess it's impossible to recover the corrupted files... or is there a way to still recover the data?

from gopar.

mdnahas avatar mdnahas commented on June 20, 2024

@apalazzi If the memory was unreliable, it would be impossible to recover the data.

@akalin I'm just a designer of the spec. I don't know much about how par2cmdline is supposed to operate.

from gopar.

akalin avatar akalin commented on June 20, 2024

@apalazzi would it be possible to run those integrity checks on the various versions of the file? I'd be interested to see if any of them pass, and if some have more failures than others...

from gopar.

Related Issues (8)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.