dop251 / diskrsync Goto Github PK

View Code? Open in Web Editor NEW

61.0 61.0 15.0 39 KB

rsync for block devices and disk images

License: MIT License

Go 100.00%

diskrsync's People

Contributors

Stargazers

Watchers

Forkers

zhangyc310 portalgun-io dron23 grinapo lichnak scalecommerce lattenero thebigb wanglixiang90 sanvu88 evengard icodein suryatmodulus gjalves zhuohq

diskrsync's Issues

cannot use path@version syntax in GOPATH mode

Following the usage instructions doesn't work for me. I'm using Debian 11.

221024 11:22 /usr/local/ch-tools3/GO root@piglet{2}# GOPATH=$(pwd) go install github.com/dop251/diskrsync/diskrsync@latest
package github.com/dop251/diskrsync/diskrsync@latest: cannot use path@version syntax in GOPATH mode

Can diskrsync write to a zfs volume ?

Hi,

I try to move a qcow2 disk image over the network to a zvol (zfs volume).
I've attached the qcow2 file to nbd0 on the remote server and then on the local server:
diskrsync --ssh-flags="-p22" --verbose --no-compress [email protected]:/dev/nbd0 /dev/zvol/vm-storage/vm-100-disk1
The program does not transmit bulk the data over the network.
On the remote node (the one with qcow2) a strace shows some futex and resource unavailable.
Should I wait more or this mode of writing directly to a block device is not supported ?

Build instructions

It would be neat to include in readme something like this for noobs like me (since you've already told how to pull the golang environment and the repo):

To rebuild the code try:

cd src/github.com/dop251/diskrsync
go build -o /tmp/diskrsync diskrsync/main.go

Disabling hole punching?

Any ideas what's wrong here:

# diskrsync --verbose --no-compress /dev/vg-root/lv-root-snap [email protected]:~/backup.img
2018/01/07 19:13:33 Target failed: operation not supported
2018/01/07 19:13:33 exit status 1

I took snapshot of a LV which contains LUKS. I've managed to use diskrsync on another LUKS device directly so I'm wondering is LV somehow breaking compatibility.

Increasing order or depth of tree?

I would like to increase the tree size so that I can achieve a smaller block size on a 1 TiB transfer. Before I simply do so (and add a command line option), are there any specific challenges of which I should be aware? An 8x increase in number of blocks would be sufficient to meet my needs.

block consistency check

Is there have data consistency check function?

No INSTALL instructions

Hello,

Could you please add some instructions on what are the build dependencies and how to compile the app?

Thanks.

Mounting the backup?

What is the format of the backup file? I ran sudo diskrsync /dev/sdb1 ~/usbstick/stick.img. Is the target file some kind of image file that can be mounted?

Motivation: I backup my encrypted disk to a remote location. I would like to be able to access that backup file from yet another computer. I've done this with sshfs+losetup+cryptsetup. In that case the image file contains LUKS encrypted filesystem. I'm wondering if something similar could be done with the resulting backup file with diskrsync (assuming the original source device was LUKS encrypted block device):

# Mount remote over SSH
sshfs $REMOTE $IMAGEDIR

# Use loop device to access the image file as a file system
losetup $LOOPDEVICE $IMAGEDIR/$IMAGEFILE

# Decrypt the encrypted file system
cryptsetup luksOpen $LOOPDEVICE $MAPPER

# Mount the decrypted file system
mount /dev/mapper/$MAPPER $MOUNTDIR

I'll play around and see if I can make it work, but I would like to hear if you have the answer.

Source read errors - assume zeros

Sometimes when copying a block device, you stumble upon bad sectors. It could happen with any drive, and it could be as small as 1 sector on the whole drive.
Unfortunately, diskrsync just stops on read error.
Sometimes backing up whatever you can from that drive to a remote location is the only option - because you don't have enough space to run ddrescue locally.
Sure, I don't ask to implement ddrescue inside diskrsync. But it would be nice that on a read error, diskrsync just assumed the read returned all zeroes and kept transfering other data.
This mode may be non-default and be enabled with some kind of flag. Still, this would allow to preserve the rest of the data.

Target failed: target: while copying block: file too large

I'm baking up a 21.8 TB disk using diskrsync, to a remote server with ext4

Sync  16.00 TiB / 21.83 TiB [=============================================================>                      ]  73 %
2023/02/01 23:08:25 Target failed: target: while copying block: file too large
2023/02/01 23:08:25 Read: 16, wrote: 14753557643280
2023/02/01 23:08:25 target error: exit status 1

I found that ext4's limit for a single file is 16TB, could you add a walkaround for this limit? Like dividing to multiple target files.

I want to continue this backup, without copying this 16TB data again.

Thanks

Please document how the remote diskrsync is found

I noticed that diskrsync was pretty flexible in finding ./diskrsync at the remote end (it was not in the path). How is the remote called exactly, how does it try to find the remote program to run?

If you tell me I'll create a PR against the README if you want.

(Sidenote: the included help use -verbose format while the usage use --verbose format [double dash], maybe these shall be synced.)

bash: diskrsync: command not found

Do I need to install diskrsync on the remote machine too, or what could be the issue here:

$ diskrsync --no-compress /dev/sda2 [email protected]:~/disk.img
bash: diskrsync: command not found
2018/01/02 17:14:12 exit status 127

I do have diskrsync available on my local/source machine:

$ diskrsync -h
Usage of diskrsync:
  -no-compress
        Store target as a raw file
  -source
        Source mode
  -ssh-flags string
        SSH flags
  -target
        Target mode
  -verbose
        Print statistics and some debug info

Disable hole punch insufficient for writing to a Linux block device

Originally posted by @jeredfloyd in #7 (comment)

Does not print final newline for the Usage message

https://github.com/dop251/diskrsync/blob/master/diskrsync/main.go#L42

Leads to output like this:

$ ./main 
Usage: ./main [--ssh-flags="..."] [--no-compress] <src> <dst>
src and dst is [[user@]host:]pathusername@host:/path/to/gocode/bin$

Windows support

It would be nice if diskrsync would allow to copy from Windows hosts as well, by using the \\.\PHYSICALDRIVE0 or \\?\Volume{guid} paths for the actual block devices.

I've actually checked (attempted to compile it under Windows and run it - it almost worked), and the default Golang file APIs allows to open such devices, even though it would be best to also call a Windows-specific API to get the minimum allowed buffer size for actually reading such devices. Also the paths with the reversed slashes seems to not work too well.

Now, this one actually depends on #17 because calling from Windows implies having .exe in a path, and we can't have that on the target Linux box.

Deduplication

I confess I am not fluent in go, so there's a bit of guesswork at hand.

If I observe correctly you calculate hashes for all blocks, so when transferring, you have all the hashes for all the blocks.

Would it be possible for a mismatched block to look up its hash in the hash table and if it's there and it already exists on the target only transfer the hash instead of the whole block, basically using locally already existing data instead of transferring it?

I would use something like --dedup switch for that if you feel someone would not like to use it (as it may be CPU intensive, in exchange for less traffic).

I wouldn't try to write it unless absolutely necessary since it would require large amounts of time to familiarise with go, so it would take quite long for me.

Adding offset and size parameters

Hello!

First of all, big thanks for this tool. You saved my life with this.

Now, an enhancement request. Would it be possible to add --offset and --size options for the source file/drive? Possibly even a --target-offset as well!

Rationale: I have a 2 TB device I need to sync to a remote host. The internet connection is imperfect (and slow), so I need to restart it sometimes.
Now, on reconnect rechecking even 30% of a 2TB drive takes about 2 hours (because of the actual device speed). This is 2 hours of checking for something which is known to be OK - or which could be easily and quickly fixed by one last full file sweep at the end of the full file transfer.

Now, with --offset and --size parameters I could split the same transfers into chunks of, let's say, 100 GB each, syncing each of them, checking for integrity and moving on to the next. This would allow for faster checks on reconnections, and in the end I could just concatenate the resulting files back to the full image and be done with it.

With --target-offset I won't even need to concatenate the files, it'll just happen automatically.

I'm pretty positive that operating with offsets would be beneficial for other uses as well, basically allowing diskrsync to become something like dd over ssh.

Thanks again for the amazing tool!

Dependencies? (other than Go deps)

I'm getting the following error when running diskrsync:

bash: /...../bin/diskrsync: No such file or directory
2018/04/22 17:45:07 exit status 127

It works when the destination is just a local filesystem path, but with remote locations I get this error. It asks my password for the remote location and then raises this error so I think SSH is working. And I do have ssh installed. Could I be missing some other dependency? I couldn't find a list of required packages (other than the go dependencies).

Append before checking option

This is another take on the problem possibly solvable by #15 .

It would be nice that if diskrsync had an option (eg --append-before-checking), which would implement this behaviour:

If diskrsync detects that the source file is bigger than the target file - first of all append all "missing" data to the target file until their sizes match, and only then do the checksum sweep and downloading all the mismatching blocks. This would make it a lot easier for resuming broken uploads without the need to check for all data when we know that the file size doesn't match yet.

I didn't understand it well

I cloned several 32Gb pendrives with a linux system installed and I modified one of them ("the master") with some little programs and files. Can I use diskrsync for re-clone pendrives coping only the differents blocks?

Adding target-path parameter

For now, it seems that when diskrsync connects to the target host, it tries to call diskrsync with the same argv[0] it had been called locally. If diskrsync was called just like diskrsync, it issues diskrsync on the target as well, when it is called as /somepath/diskrsync - it executes /somepath/diskrsync on the target too.

It would be nice to override this behaviour, by allowing to set explicitely the target binary path via an option, eg diskrsync --target-path /somepath/diskrsync.

Feature request: rollback versioning

Would be great if diskrsync supported versioning so that it is possible to rollback to an old backup. So, whenever the target is updated, the "reverse" of the update diff is stored as a rollback diff. Then these diffs could be applied to the image file in case one wants to rollback the image.

Connection stalled detection

Sometimes when the internet goes wonky, the underlying SSH connection gets stalled, which results in 0 transfer speed. At this stage, the transfer seems to be stuck because of the underlying SSH client actually not understanding what is happening, and ConnectTimeout doesn't actually seem to help on an already established SSH connection.

It would be nice, if diskrsync would be able to automatically detect such conditions, terminate ssh and restart the transfer automatically (probably rechecking both the source and the target to find out from where it needs to resume the transfer again).

This repo still active?

Hi just wondering if this is still active and if you are taking requests and PRs?

In particular I would like to add TLS connectivity besides ssh if that is possible.