Giter Club home page Giter Club logo

Comments (12)

ts678 avatar ts678 commented on August 17, 2024

The step before step 1 might be to let the user pick a version, which got more complicated after partial backups got formalized.
IsFullBackup setting requires looking at the fileset file in the dlist, which is especially unfortunate for those who retain many.

Step 3 is unclear on whether it builds a full block map, or is tailored around the maybe smaller needs of the specific files chosen.

Step 4 is unclear on how dblock files are selected, e.g. does it build on step 3? Random downloads until done would be wasteful.

Step 5 begs the questions of how big the cache is (RecoveryTool can be all dblocks, so big), and what happens when it runs out.

Step 6 is unclear compared to step 5 which sounds like dblocks and target files. How does step 6 get blocks, and for what uses?

  --index-file-policy (Enumeration): Determines usage of index files
    The index files are used to limit the need for downloading dblock files
    when there is no local database present. The more information is recorded
    in the index files, the faster operations can proceed without the
    database. The tradeoff is that larger index files take up more remote
    space and which may never be used.
    * values: None, Lookup, Full
    * default value: Full

complicates the scheme, although the vast majority of users probably stick with the default, in which case the vol folder gives dblock file blocks, and the list folder gives blocklist copies, allowing blocklist expansion to be verified without dblock reading.

Robustness against corrupted backups is still an issue of course, and I'm not sure how simplified the plan gets compared to old.

I have some tools that rely on having default Full dindex-file-policy to simplify things, otherwise, they would be much harder.

Using recovery tool with missing dlist files uses dindex blocklist copies to reassemble multi-block files from blocks in local folder.

checker12.zip was written to try to predict whether a recreate would wind up in the annoyingly slow dblock download situation.

This has had some changes since, as I got tired of the noise of some benign issues. The other limitation of a simple script making summary output is that finding details means letting the right comments actually become print lines, and going over the output.

The next incomplete effort was to see if an SQLite database would be an easier fault-analysis vehicle compared to a large text file. While I was at it, I looked to see if large SQL INSERT were faster than one-at-a-time. As expected, this seems to add some speed, although having very little error checking (the purpose here is to be a fairly direct replica of the destination) probably helped too.

from duplicati.

kenkendk avatar kenkendk commented on August 17, 2024

Thanks @ts678 for the detailed comments and links to related work!

I was mostly annoyed with the current restore taking forever to recreate the database, and then I looked at the recovery tool and decided that it can be significantly improved. My steps above are not super clear when reading it with fresh eyes, but the idea is to basically mimic what the recovery tool does, when only doing restores.

Perhaps we can even look at using the recovery tool to do the restores, although it does not currently support metadata and symlinks.

I have made some updates to the recovery tool, showing that the restore can get significantly faster, and I think the regular restore should be at least as fast.

from duplicati.

CCWTech avatar CCWTech commented on August 17, 2024

Is there no way to duplicate the database on the destination data so that in the event of a server crash you would have the full database and it wouldn't have to rebuild?

from duplicati.

ts678 avatar ts678 commented on August 17, 2024

Is there no way to duplicate the database

There are ample DIY ways. The question is whether or not you would be willing to spend the time and space that it would take.
For some people, the answer might be yes. For others, a database recreate is fast enough. Unfortunately, the speed may vary...

When index files are sufficient, it's merely slow. When a search through all dblock files is needed to find blocks, it can get SLOW.
That's one reason I wrote the block checker script, except it would be better if Duplicati could detect such errors ahead of time...

For an example of a do-it-yourself approach:

run-script-after gets environment variables. Example Scripts describes the plan. Ones you might like for your database copy are:
DUPLICATI__dbpath for path to copy. You might want to encrypt a copy, or maybe destination is considered a low enough risk.
DUPLICATI__REMOTEURL if you want to store it with backup files, for example by using Duplicati.CommandLine.BackendTool.exe.

EDIT:

Depending on the job schedules, you might also have a small Duplicati job run after the actual backup jobs to backup databases.
The database for this job faces the same recreation challenge, but should be pretty small, so is a bit easier to deal with somehow.

from duplicati.

scottalanmiller avatar scottalanmiller commented on August 17, 2024

In the meantime, why not have Duplicati automatically (or as an option, but a default and obvious one) backup its own database. And then default to restoring that. Not the final solution, but wouldn't that fix things for most users immediately?

from duplicati.

ts678 avatar ts678 commented on August 17, 2024

why not have

The general answer to any enhancement request is that volunteer (and now paid also) resources are not enough for all wishes.

These requests are both kind of off-topic, but if there is not already an open enhancement issue, perhaps you could open one.

I'm not the person who picks what to do, but in general, new features are outweighed by other needs, which you can see here.

New features are at the bottom.

from duplicati.

CCWTech avatar CCWTech commented on August 17, 2024

from duplicati.

ts678 avatar ts678 commented on August 17, 2024

I think it stopped! The command window

@CCWTech

Does this have anything to do with the topic? For support, ideally use https://forum.duplicati.com/, or if GitHub must be used, please open a support issue that explains things fully.

from duplicati.

CCWTech avatar CCWTech commented on August 17, 2024

I think it stopped! The command window

@CCWTech

Does this have anything to do with the topic? For support, ideally use https://forum.duplicati.com/, or if GitHub must be used, please open a support issue that explains things fully.

Sorry, posted here by mistake.

from duplicati.

ts678 avatar ts678 commented on August 17, 2024

Thanks for clarifying. Anyway, on database copy request, I couldn't find that anyone has opened an Issue yet, but continuing:

Although there's no telling what features will be prioritized when, especially with the commercial use of Duplicati, Inc., for now

"C:\Program Files\Duplicati 2\Duplicati.CommandLine.BackendTool.exe" put "%DUPLICATI__REMOTEURL%\" "%DUPLICATI__dbpath%"

in run-script-after will do an unencrypted backup of the database to the destination -- if Duplicati doesn't get stopped before.
With a database copy, the question arises of what happens if the destination and database fall out of sync. It can cause issues.
Local SQL database more accurately reflects the destination (as it should), although the SQL transactions can lead to surprises.

Copy DB to backup storage to speed up desaster restore on another computer? Blocksize on restore? has shown a bash script which is a lot more complex than a one-liner similar to mine above, but it uploads via curl, and lots of lines are devoted to it.

from duplicati.

ts678 avatar ts678 commented on August 17, 2024

In case anyone was wondering about backslash before my closing double quote above, it's because my test used a folder URL.
These often end in backslash, but Windows takes backslash double quote as literal double quote. Double backslash avoids this.
Other types of target URL might not need this. To continue with the YMMV talk, one specific DB-out-of-sync risk is a compact.

Compacting files at the backend describes how backup deletion means some destination data blocks are no longer useful, and wasted space is removed by repackaging blocks into new dblock files. The new is uploaded before the old is deleted, so block is always available through its dindex file, but the previous remote database copy knows nothing of this, so will will get surprised.

There might not be a compact on every backup (logs will say), and backup runs aren't always interrupted at unfortunate times.
Any coded-into-Duplicati system should be able to deal with such events, but it can be difficult even with just a local database.

Backup with no-auto-compact along with manual compact occasionally can help do-it-yourself copying, and you might usually enjoy a faster somewhat manual alternative to direct restore, however you might need to run repair to remove some extra files because uploading files is what a backup does, and the database from past backup will see extra, but a repair can remove extra. Perceived-as-missing files such as deleted dblock files from compacts interrupted before database upload are hard to deal with.

How the backup process works shows how a dlist file references blocks by hash, so dblock files can be rearranged by compacts.

/*
The individual block hashes,
mapped to the containing remote volume
*/
CREATE TABLE "Block" (
"ID" INTEGER PRIMARY KEY,
"Hash" TEXT NOT NULL,
"Size" INTEGER NOT NULL,
"VolumeID" INTEGER NOT NULL
);

shows how the database (as built by backup or recreate including the one in direct restore) references blocks by dblock volume.

from duplicati.

kenkendk avatar kenkendk commented on August 17, 2024

@CCWTech & @scottalanmiller The logic behind the local database is that it provides a view of the data in a way that is optimized for the backup process. The only reason this is needed is to avoid indexing the backup data before starting the backup. But since the data is already present, Duplicati uses the same data to do the restore.

The logic back then was to rebuild whatever part of the database was required for the restore to complete, and then run the same code for restore, regardless of having a partial or fully populated database. This works nicely until a certain size, where the recreate process explodes in size, causing it to be "stuck" in the recreate step for way too long.

Thanks @ts678 for linking the relevant DIY methods.

Going forward, I don't think storing this database with the backup is a good idea, because:

  • The restore does not need it (...except for speed issues)
  • It changes as part of the backup process
  • The size can be quite large (all paths * all hashes * all versions)

As mentioned in this issue, I think a better way forward is to revisit the idea of using the local database for restores. If we can optimize the reconstruction of the local database, so it performs at a similar speed as the recovery tool can achieve, that would solve the problem.

If that is not possible, two alternatives are on my map:

  • Change the local database format to be more "restore friendly"
  • Change the restore process to use a different process when a local database is not already present

I tested the second way in #5243 which shows immense speedup with hardly any memory usage.

from duplicati.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.