przemoc / metastore Goto Github PK
View Code? Open in Web Editor NEWStore and restore metadata from a filesystem.
Home Page: http://software.przemoc.net/#metastore
License: GNU General Public License v2.0
Store and restore metadata from a filesystem.
Home Page: http://software.przemoc.net/#metastore
License: GNU General Public License v2.0
Attempting to apply metadata to a filesystem where the current UID/GID is not in /etc/passwd results in a "getpwuid failed" error on startup, and a failure to change the ownership of the files specified in .metadata, with a "removed" message when the files were actually there.
(Situation occurred after running an rsync from a host when restoring, and the UID/GID in question restored by rsync was a different UID/GID to the original.)
Output should be similar to:
$ ls -alp --time-style=full-iso
total 4
drwxr-xr-x 1 przemoc przemoc 26 2015-09-06 19:45:34.222175902 +0200 ./
drwxr-xr-x 1 przemoc przemoc 2648 2015-09-07 16:02:10.047591753 +0200 ../
-rw-r--r-- 1 przemoc przemoc 99 2015-09-06 19:45:34.222175902 +0200 .metadata
-rw-r--r-- 1 przemoc przemoc 0 2015-09-06 19:27:41.334181566 +0200 test
Current idea is to output following tab-separated columns:
+%F %T.%N %z
),If file have extended attributes, then each attribute name and its value will be shown on new line in 6th column (xattr) with only 5th column (path) not cleared. Value will be shown as text in quotes if all bytes are within 32-126 range or as hex prefixed with 0x
otherwise. Example:
$ metastore -d
-rw-r--r-- przemoc przemoc 2015-09-06 19:27:41.334181566 +0200 ./test
./test user.txt="tekst"
./test user.bin=0x020100ff00
Path order will be undefined. But you'll be able pipe output to LC_ALL=C sort -t $'\t' -k5
(if you don't have bash/zsh, then replace $'\t'
with literal tab in quotes).
By default dump should use existing metastore file (typically .metadata
), as it was shown in above example, but it should be also able to dump metastore file that would be created if save action was used with given path. Example:
$ metastore -d .
-rw-r--r-- przemoc przemoc 2015-09-06 19:45:34.222175902 +0200 ./.metadata
drwxr-xr-x przemoc przemoc 2015-09-06 19:45:34.222175902 +0200 ./
-rw-r--r-- przemoc przemoc 2015-09-06 19:27:41.334181566 +0200 ./test
./test user.txt="tekst"
./test user.bin=0x0201000000
Dump action is meant only as a helpful debugging facility/merge conflict helper. Do not ever compare dumps taken using different metastore version. Do not rely on current output format (especially in batch scripts), because it may change in future without prior notice.
Hi.
After trying to apply xattrs using metastore -a -v -f /somedir/metafile /sourcedir
, I get the following errors:
./test/test1/vzd_test: adding xattr system.posix_acl_default
lsetxattr failed: Bad address
strace reports this:
lsetxattr("./test/test1/vzd_test", "system.posix_acl_default", 0x148, 52, XATTR_CREATE) = -1 EFAULT (Bad address)
lsetxattr("./test/test1/vzd_test", "system.posix_acl_default", 0x148, 52, XATTR_REPLACE) = -1 EFAULT (Bad address)
If you wonder what second lsetxattr()
with XATTR_REPLACE
is doing after the first one, I must admit I patched original metastore.c
by adding second lsetxattr in case of error in the first call, but it doesn,t help.
File system is ext4, mounted with both acl
and user_xattr
:
/dev/sda1 on /mnt/disk type ext4 (rw,user_xattr,acl)
Apply is being done on the same filesystem as save.
I am using the latest git clone.
Regards,
Uros
Judging from system monitor metastore -s
only uses one thread. I'm naively assuming that at some point it has to walk down a file and directory tree and visit it's nodes recursively or iteratively. I propose to put file paths in a directory in groups of <= 100 into queues from which n
threads can poll and create the file output which can then be written into a large buffer (in order to avoid an I/O bottleneck). In case it's necessary the output needs to be ordered all threads need a sequence number and others must not proceed until the lowest has finished (all threads have to do nothing, but stat
calls which should cause quite equal load on each thread).
As can be seen in commits history, there is barely any development going on, but there were some bugfixes since the last released v1.1.2 from 5.5 years ago (NEWS file mentions them), so it would be good to release new version, v1.1.3. I feel bad it didn't happen earlier.
To make it happen I need to do following steps before:
And it was similar struggle in the past. GPG always seemed a bother to me. Maybe it's just me, or maybe other folks who use it rarely (i.e. not even monthly) can relate.
Side note:
Truth is that I don't really use metastore myself (and it's like that for many years already), that's why the project had not seen much love other than fixing bugs.
since it messes up binaries installed by package management (e.g. on Debian based systems). Either the prefix should be /usr/local
or - even better - configurable.
Consider switching to autoconf
which might seem overkill, but avoids creating a configure
script now which later will be replaced by autoconf
anyway or always consume maintenance costs.
When switching to autoconf
#23 has to be reviewed.
It's desirable to introduce new metadata file format that would be human-friendly and merge-friendly (when used in VCS like git), so making it textual is an obvious choice. Such format should be compact (no XML!), but not too compact. Below you can see current version of my draft amendment.
Data types
----------
SSTRING - `;`-terminated string with special characters (`\n`, etc.)
and semicolon escaped
" v001t\n\n" file format
------------------------
HEADER
N * PENTRY
PENTRY format
-------------
SSTRING - Path
BSTRING(1) - Parameter:
"m" - mode
"o" - owner
"t" - mtime
"x" - xattr
BSTRING(1) - "="
SSTRING - Parameter value
BSTRING(1) - "\n"
Patameter value formats
-----------------------
mode - octal mode
owner - "USER:GROUP"
mtime - UTC date+time in basic ISO8601 format (`%Y%m%dT%H%M%S.%NZ`)
xattr - "KEY1=VALUE1[,KEY2=VALUE2...]"
(keys and values have comma and equals sign characters escaped)
Example:
MeTaSt00r3 v001t
metastore.c;m=644;
metastore.c;o=przemoc:users;
metastore.c;t=20140302T162230.123456789Z;
metastore.c;x=;
Why not put all parameters in one line? Well, it would be more space-efficient, sure, but also more error-prone and less merge-friendly. So I say no for all file parameters in one line.
Why not put file name only once followed by parameters, each one in its own line? Because we lose contextlessness of each line then, and meaningful line without context is a really nice asset that I would like to have in such new format, for all your merge, grep, etc. intents and purposes.
OTOH support for gzipping can be still considered I think. Git has textconv
, so diff case can be handled well. For (hopefully rare) merge case one can gunzip file, fix it and re-gzip. Or do g(un)zipifying conversion by metastore (it depends on what would be gzipped, whole metastore file or only data after header?). Space savings coming from gzipping could be substantial for repositories with lot of files. Maybe disk space usage would be then even similar to the old format? Still, these merges, grr... If only git supported bidirectional textconv
... :-)
Backward compatibility dictates that such new metadata format rather won't be a default one. There is arising need for metastore configuration file and I'll add a new issue for that.
In building metastore (on Fedora 20) I found that the include line for xattr.h triggered a nosuch file or directory response. Looking in a number of man pages, and the book "The Linux Programming Interface", by Michael Kerrisk, I found an alternate location, which worked properly for me.
Additionally, I needed to add an include for errno.h.
Both of these files are installed by the glibc-headers-* packages.
I couldn't determine on what distros the location <attr/xattr.h> is correct.
Thanks for your attention.
/ken
The patch is:
diff --git a/metaentry.c b/metaentry.c
index b0ea69d..02e5bb8 100644
--- a/metaentry.c
+++ b/metaentry.c
@@ -25,13 +25,14 @@
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
-#include <attr/xattr.h>
+#include <sys/xattr.h>
#include <limits.h>
#include <dirent.h>
#include <sys/mman.h>
#include <utime.h>
#include <fcntl.h>
#include <stdint.h>
+#include <errno.h>
#include "metastore.h"
#include "metaentry.h"
diff --git a/metastore.c b/metastore.c
index de1bf07..0a49e3f 100644
--- a/metastore.c
+++ b/metastore.c
@@ -23,10 +23,11 @@
#include <sys/stat.h>
#include <getopt.h>
#include <utime.h>
-#include <attr/xattr.h>
+#include <sys/xattr.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
+#include <errno.h>
#include "metastore.h"
#include "settings.h"
Error:
gcc -march=x86-64 -mtune=generic -O2 -pipe -fstack-protector-strong --param=ssp-buffer-size=4 -g -Wall -pedantic -std=c99 -D_FILE_OFFSET_BITS=64 -O2 -Wl,-O1,--sort-common,--as-needed,-z,relro -lbsd -o metastore utils.o metastore.o metaentry.o
metaentry.o: In function `mentries_dump':
~/metastore/metaentry.c:643: undefined reference to `strmode'
This was caused by the default LDFLAGS="-Wl,--as-needed"
on Arch Linux, Gentoo and, guess, many other modern distributions.
According to https://wiki.gentoo.org/wiki/Project:Quality_Assurance/As-needed#Importance_of_linking_order, libraries must be passed to linker after the object files and the static archives.
A --dump option that dumps the contents of the .metadata file in a format similar to ls -l would be very useful in debugging and troubleshooting merge problems, et. al.
We have to avoid preprocessor macros spaghetti, like for various OS approaches regarding extended attributes.
We already have NO_XATTR
for systems not supporting it, but if they support it differently than Linux, then common OS-agnostic API should be introduced internally that would have OS-specific implementations for different POSIX systems.
Let's not tackle with non-POSIX systems like Windows, as it would require even bigger changes, that may not be really suitable for metastore.
There were some existing efforts to make metastore work on systems different than Linux.
Moving source files to src/
subdirectory, etc.
A text file format is needed so I can track metadata change using the same way as the other files in git repository.
Willing to contribute.
Why choose a binary file format in the first place? Should we add the text file format as an alternative or just replace the original binary file format and eliminate the "-c" action?
Thank you, metastore is cool!
I tried add support atime
, ctime
and size
in my branch: https://github.com/data-man/metastore/tree/other_times.
currenct metastore hooks does not support branches that contains different dirs/files perms.
Curious if this is as intended -- compiling on Mac gives following:
btmacpro:metastore btorpey$ make
sed: 1: "/^1:/s,,,;/:/{s,[^:]*:, ...": bad flag in substitute command: '}'
gcc -g -Wall -pedantic -std=c99 -D_FILE_OFFSET_BITS=64 -O2 -DMETASTORE_VER="\"\"" -o metaentry.o -c ./src/metaentry.c
./src/metaentry.c:35:10: fatal error: 'bsd/string.h' file not found
#include <bsd/string.h>
^
1 error generated.
make: *** [metaentry.o] Error 1
btmacpro:metastore btorpey$
Thanks in advance for any help. Converting to git and cant live without mtime!
As I was mostly out-of-git metastore user myself, I haven't caught it, but apparently pre-commit hook's git add .metadata
doesn't do what one may think it does, i.e. this example script is in fact broken with non-ancient git versions. Thus it has to be thoroughly tested and fixed along the way.
Additional note:
After introducing this feature --git
option or rather lack thereof should be equivalent to --exclude=.git
.
I have been looking at your tool and several others and they all seem to not be able to handle the following scenario. Suppose you have a large repo with lots of developers working on some of the same files. The file metadata is used to control the build process, which is lengthy due to the size of the project; the metadata is limited to file access and modification times.
How do you handle the file metadata when a developer decides to pull the official repo or merges his/her code with the official repo, and the official repo contains changes to the same files that he/she is working on? Obviously, you want to preserve the file changes so a file contents merge will need to occur but what to do with the metadata? What if the metadata in the official repo points to a time earlier than the metadata for such file in the developers node? What if the opposite is true. There are several scenarios at play in here but I think the 2 described above are the main ones.
Can megastore handle such scenarios?
metastore ignores ".git" which I think is obviously not enough.
The problem I encounter is that metastore unneccessarily records the metadata of file .metadata.
I'd like to add an option "-x, --exclude=PAT" which accepts a pattern just like diff does.
Example:
$ metastore -a
./unknown-uid: changing metadata
./unknown-uid: changing owner from przemoc to przem0c
Issue can be noticed only when verbosity is increased.
$ metastore -av
...
./unknown-uid: changing metadata
./unknown-uid: changing owner from przemoc to przem0c
getpwnam failed: No error information
...
This should be improved.
That's not ideal for usage with git
as backup program since it shouldn't be necessary to have a file owner present on the system where the backup is created.
Failure is assumed from the text of the warning. In case that's just a warning, it needs be improved and/or labeled in order to sound like a warning.
This makes metastore
more attractive for users who just want to use it with git
and something that just works.
see #11
When I install metastore on Cygwin. It will show the following error.
gcc -g -Wall -pedantic -std=c99 -D_FILE_OFFSET_BITS=64 -O2 -o metaentry.o -c ./src/metaentry.c
./src/metaentry.c:35:24: fatal error: bsd/string.h: No such file or directory
#include <bsd/string.h>
^
compilation terminated.
Makefile:56: recipe for target 'metaentry.o' failed
make: *** [metaentry.o] Error 1
I think there could be more generic way to use string.h but not bsd/string.h?
Additional note:
Supporting .gitignore
file syntax would be nice.
Steps to reproduce:
$ sudo mkdir /tmp/a
$ sudo chown root:root /tmp/a
$ sudo chmod 0700 /tmp/a
$ metastore -s /tmp/a/
getxattr failed for /tmp/a: Permission denied
*** Error in `metastore': free(): invalid pointer: 0x00007f40897e8c58 ***
Abgebrochen (Speicherabzug geschrieben)
experienced with f1e4842
Configuration files will be optional and they will be read in following order:
/etc/metastore.conf
(system options)$HOME/.metastore.conf
(global options)$CWD/.metastore.conf
(local options)The format will be most likely INI-like (key = value
), but without any sections.
Options theoretically required to support current features:
verbosity = INT
- verbosity level (0
by default)mtime = BOOL
- should mtime be considered when applying or diffing metadata? (no
by default)empty-dirs = BOOL
- recreate missing empty directories (no
by default)git = BOOL
- do not omit .git directories (no
by default)file = STR
- metadata file (.metadata
by default)Options required to support future features:
format = STR
- metadata file format (0
by default, v001t
for new one)format-convert = BOOL
- should metadata file be converted to chosen format even if present file uses other one? (no
by default)exclude = STR
- exclude dirs/files that match pattern (.git
by default)exclude-from = STR
- exclude dirs/files that match any pattern in fileexclude-reset = BOOL
- removes all excludes defined earlier using exclude
when trueexclude-from-reset = BOOL
- removes all excludes defined earlier using exclude-from
when true (no-op when false)remove-empty-dirs = BOOL
- remove empty directories not present in applied metadata file (no
by default)work-on-parameters = STR
- parameters that should be considered when diffing or applying metadata ("mox" by default)Some options theoretically required to support current features would be better not present at all. When exclude
will be available, there will be no need for git
, as it would complicate things more (ignoring .git
directories being excluded...). Similarly work-on-parameters
is much nicer than specific mtime
.
During implementation and tests of new dump action, I've noticed that while non-textual values in extended attributes were properly stored in .metadata file, retrieving them from it was simply broken, i.e. anything beyond first null byte was zeroed.
Small quantum of solace is the fact that apparently metastore users rarely use extended attributes or at least rarely with non-textual values, because otherwise they would surely report such crucial bug.
Some of you feel unsure about metastore, because it's no longer accessible from David Härdeman's site, who is original author and maintainer of this useful tool. David no longer works on metastore. My repository is unofficial continuation, as I call it, but I already contacted David regarding possibility of ceding his maintainership of metastore and making my repository the official one. His response was positive. Such thing should be done publicly, though, and it will be done publicly, obviously with the help of git. When it will happen, then text about unofficial continuation will be removed from project description and this issue will be closed.
(It's not really an issue, rather kind of note for those curious about metastore.)
The --quiet option right now reduces, but does not eliminate, the output of messages. It can be useful in some contexts to make metastore completely silent. If two --quiet options are found on the command-line, only errors should be output, and adding a third --quiet should make it completely silent (a.k.a. "quiet").
Please, make a new release for metastore in some short time. It is very useful for program packaging when a program has releases numbers.
Hello!
Using this tool with git, for synchronising a set of files where I'd like to preserve timestamps!
It seems that metastore creates the .metastore with nanoseconds, but during application only applies the timestamp to the nearest second. Therefore, when synchronising back/forth, the nanoseconds are eliminated.
Could the tool be adjusted so that there's a choice over seconds vs. fractional seconds?
Hi, I'd like to propose that an command-line option is added to metastore so that it records metadata only for files and directories specified by the user. This will be very helpful for Git workflow, where the ones we want to keep track and recorded in Git repo eventually are just those files/dirs under GIt control. Currently the metadata of ALL files found in the current directory and its subdirs are recorded, so it becomes superfluous. This feature is complementary to #8 , but more specifically will be relevant to Git.
In investigation the generality of the currently suggested fix to fix to Issue #21, I discovered that it would be pretty straight-forward to port it to FreeBSD. That might be a nice enhancement.
It would be helpful to have a --version
parameter for the command printing the current version of the binary and of the versions of the most important dependencies if that makes sense.
I have it compiling and working in Cygwin and Mingw in
https://github.com/rasa/metastore.git
but since Windows doesn't really support uid
's or gid
's, we need to tweak the code to use SID
's instead (at least for MingW, Cygwin works around this somehow).
Cygwin's SID > uid mapping logic is at:
https://cygwin.com/cygwin-ug-net/ntsec.html
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.