lex-2008 / backup3 Goto Github PK
View Code? Open in Web Editor NEWBackups using rsync, busybox and SQLite
License: MIT License
Backups using rsync, busybox and SQLite
License: MIT License
Currently it loops through whole alltimes array, checking each element like this:
changesCache[time]=alltimes.filter(a=>a<=time).length;
Instead, it should use the fact that alltimes is ordered (TODO: it should be), and check only elements between prev checked item and current one.
clean.sh should have an option to keep at least N
monthly/hourly/daily backups.
Currently, output of comm
gets separated into "new" and "deleted" files, which are fed into four "{operate,sql} on {old,new} files" loops. Instead, it should be like this:
output of comm
is cleaned [*] goes into two pipes: for files and for sql
$BACKUP_LIST
file to contain file sizes, too, since we add them to database. But maybe that's good thing?)"
, remove inode numbers, maybe replace tabs with simpler way of distinguishing new/deleted files (N
/D
as before).Sorting output of comm
has pros and cons:
comm
to finishbut seems to have small impact in real life anyway
this will slow backup.sh
a little, but will make it possible to rebuild database from files
also edit dedup and check scripts because they're changing deleted timestamp
..
path in api.sh should be corrected/ignoredAdd user-defined "importance" SQL expression (default 1) - so based on other factors like dirname and created/deleted dates we could define relevant importance of files.
Issue here is that it will be harder to show on webUI - not clear what times are still valid. For example, what should progress bar show in these cases:
a/b/
are cleared twice as early as in a/
*.bak
files are deleted twice as early, and we're in directory which has both *.bak
and "normal" filesfilename search:
filename = "$1"
)filename LIKE "$1"
) - user provides %
as neededfilename LIKE "%$1%"
) - adds %
before and after automaticallydirectory
dirname LIKE "%$2%"
)created
deleted
existed
case?
result: table
Note that it will break "simple" method.
Reason: if we sync come dir only hourly, there's no need to go find
into it every time. Moreover, if run_rsync
figured out that we can't connect to target - we don't need to do it, either.
Solution is to run all find
- comm
- sed
/ sqlite
stuff at the end of each run_rsync
part.
Main issue is that df
output is unreliable, so clean.sh should be reworked:
it should be running separately from backup.sh, about once an hour
when less than 10% disk space left - it should start deleting files, keeping note on their sizes
when sum of sizes deleted files is more than 10% of disk space - stop and run sudo btrfs balance start -dusage=50
main backup.sh should just refuse to do anything when there's less than 10% disk space free
otherwise all files are downloaded instead of being shown in a browser.
stackoverflow suggests using file
, but I haven't installed it on the target machine yet.
As of now, directories are not tracked in "data" directory and exist only as records in database. As a result, rebuild.sh
will happily wipe wipe empty directories. And they might be needed, for example, for mount points.
instead of doing all calculations server-side, we should just
SELECT * FROM history WHERE dirname = '$root' OR dirname LIKE '$root/%'
and parse output client-side
to check for files only in one dir
'cause otherwise it makes no sence
To show it in "show all versions of file" dialog
it should speed up checking
show.sh should exclude root dirs
api.sh should call it
button should call api.sh
Evolution of #10
but first check performance
par2create -n1 $filename
Recovery block count: 100
and save it in database - not sure why it's neededrm "$filename"*
- will have to read directory anyway"old files" should be done before "new files" - to decrease "for_update" index
"old files" should work in one pipeline: sqlite query should both update DB and produce output for file operations - like backup1.sh
does
maybe increase pipe size
https://unix.stackexchange.com/questions/439196/change-buffer-size-of-named-pipe
mv newfile oldfile
par2verify...
<press Ctrl+C here>
mv oldfile newfile
newfile
might be left with oldfile
name
or just regenerate it on --fix
so they are not deduplicated
useful when first selected is password protected
after #35 is fixed, we should keep N
dirs for these "protected" views.
This will make backups easier to restore if main backup/restore script is gone.
should use direct links instead of JS
first mention features/benefits:
then installation on all platforms
$package_manager install
to first two lines of https://github.com/Lex-2008/backup3#requirementsthen configuration (simple/advanced usage), WebUI, and all other complications can be moved to a separate "extra features" file
Currently, we run find
and add to backup whatever state "current" directory is after rsync
run finished. No matter if it was successful or partial. It can lead to an issue like this:
if you have edited few files which depend on each other (for example shell scripts which call each other), interrupted due to timeout rsync might have copied only one of them. In this case, if your later restore backup at this time - you end up with a broken system (old version of one file, new version of another).
Solution is to rsync to a temporary dir and ... that one to "current" one. Note that inodes, also of directories, must not change!
read without -r will mangle backslashes - shellcheck
causing back-in-time dates (where created > deleted) for frequently changing files (rsync.log)
Should be fixable by setting $BACKUP_TIME
after, not before calling acquire_lock
.
Issue happens when several (>=3) instances of backup.sh
run in parallel, like this:
And, behold: we get a file which is created at 12:05 and deleted at 12:01. This shouldn't happen.
First, render dirs
Then, ask for current files (or history of all files?)
Check for current selected timestamp before showing files
ALSO when requesting password:
First, clean everything
Then, request…
Currently, it's there to keep time
cd to bindir and run git pull
, weekly or so.
When there was no change in backed-up files, files.txt.diff
will look like this:
D 0 backup.db
N 0 backup.db
In this case, we should exit instead of processing (it creates only duplicate database backup)
Issue is that in migration process, each directory gets a new inode every time (on every step). Hence, it is considered as deleted-created, and receives a new entry. Properly, this could be fixed by nullifying directory inode numbers in backup.sh
, like this:
UPDATE fs SET inode=0 WHERE type='d';
but then it will be required to do this on every backup operation, which sounds wrong.
Hence, a workaround - we'll just discard all data and rebuild directories from scratch. Note that it will lose data directory - just like rebuild.sh
does. Execute these commands in your SQLite console:
delete from history where type='d';
WITH RECURSIVE cte(org, parent, name, rest, pos, data1, data2) AS (
SELECT dirname, '', '.', SUBSTR(dirname,3), 0, min(created), max(deleted) FROM history
GROUP BY dirname
UNION ALL
SELECT org,
SUBSTR(org,1,pos+length(name)+1) as parent,
SUBSTR(rest,1,INSTR(rest, '/')-1) as name,
SUBSTR(rest,INSTR(rest,'/')+1) as rest,
pos+length(name)+1 as pos,
data1, data2
FROM cte
WHERE rest <> ''
)
INSERT INTO history (inode, type, dirname, filename, created, deleted, freq)
SELECT 0, 'd', parent, name, min(data1), max(data2), 0
FROM cte
WHERE pos <> 0
GROUP BY parent, name;
analyze;
vacuum;
It uses recursive cte, as explained in https://stackoverflow.com/a/34665012
it will enable "LIKE optimisation" and make building table old_files
faster.
Probably it should also include starting ./
, also. So for files in root dir, dirname will be ./
, for others ./dirname/
and ./path/to/
old2db check assumes that each file in data dir has ~
in its name and a corresponding entry in database. It's not true for backup *.par2
/*.bak
files.
If you backup a directory with following states:
a/b
is a filea
is a filea/c
is a filethen, after rebuild.sh
, database will have the following entries (among others):
a
from time 1 until nowa
from time 2 until time 3Note that they clearly overlap. check.sh --fix
will "fix" it in unclean way - likely directory will exist only from time 1 until time 2, but not at time 3.
run scripts in set -e
mode, and print if a command fails
Something like:
https://unix.stackexchange.com/questions/21930/last-failed-command-in-bash
https://stackoverflow.com/questions/3822621/how-to-exit-if-a-command-failed
Note, however, that this is busybox ash
, not bash :)
Currently, only top-level dirs are password-protected.
Instead, there should be a list of dirs to be protected, in an administrator-managed file.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.