Giter Club home page Giter Club logo

Comments (16)

fpemud avatar fpemud commented on June 12, 2024
  1. What about the filename encoding and escaping?
  2. If xattr contains binary value, it should be displayed in hexadecimal representation
  3. Orders (filename order, "m o t x" order, xattr order) should be defined, so that metastore generates the same result as long as the metadata is not changed.
  4. Every xattr should be put in a single line, not together in one line. For comparing or merging conveninence.
  5. What if ";" is contained in filename?
  6. There would be much redundant data for long path, such as "lib/plugins/os/microsoft-windows-vista/fvm_plugin.py"

from metastore.

fpemud avatar fpemud commented on June 12, 2024

File format proposal (for reference only, your project, you make decision):

# header comment

[metastore.c]
mode=644
owner=przemoc:users
mtime=20140302T162230.123456789Z
xattr="name1":txt"value1"
xattr="name2":hex"00 01 02 03 04 05"

[lib/plugins/os/microsoft-windows-vista/fvm_plugin.py]
mode=644
owner=przemoc:users
mtime=20140302T162230.123456789Z
xattr="name1":txt"value1\\\""
xattr="name2\\\"":hex"00 01 02 03 04 05"
  1. I don't think HEADER is useful. AFAIK all configuration file or data file use header comment.
  2. xattr name and value needs escaping.
  3. This format is common ini file format. There should be libraries around that can do r/w operation.
  4. This file is UTF-8 encoded.

from metastore.

przemoc avatar przemoc commented on June 12, 2024

@fpemud, thanks for your comments. My draft is rough at this stage and surely can be improved. I'll respond to some of your points and update the draft / issue description some time later (not today, though).

Your first comment:

  1. Non-system-dependent encoding is possibly desirable, but it's not a simple change. If we're going to introduce one encoding, then UTF8 seems reasonable.

  2. Originally you suggested base64, then switched to hex representation. I already explained, but definitely not deeply enough, how I see treating special characters in SSTRING - they should be escaped. How? Special characters other than ; could be written in hex form (\xHH).

    I wasn't exposed much to xattrs, so I have to check what real world puts there - are there always binary data or maybe textual (or rather text-like) data are more often (I silently assumed the latter initially, but I may be wrong here).

  3. Yes, strict ordering is rather obvious for file creation, but I should mention it explicitly in the description. It's necessary only to avoid superfluous changes in the file coming from lack of such ordering, and for better diff / merge workflows, of course. Mind that my intention is that metastore should be able to read and apply metadata from file even if the order is different, in accordance with Postel's law (robustness principle).

  4. I see where are you coming from and I thought a bit about it before. We need to know all extended attributes to be able to remove no longer existing ones, but as I don't want to ditch Postel's law, we need all xattrs at once, i.e. in one line. I call it a necessary compromise.

  5. SSTRING has semicolon escaped (so ; turns into \;) - no problem here.

  6. I call it a necessary compromise for great robustness.

Your second comment:

I thought about INI before, but I don't think it really suits metastore needs. File names can have brackets, so you have to escape both, and quite likely fix handling of that case in such INI library. These libraries also usually "overwrite" repeated key in section (your decomposed xattr), so it's another bother to deal with (maybe there are event-based INI parsers, that would help a bit I guess).

I have to add that MeTaSt00r3 at the beginning of file is to preserve metastore file detection by existing tools that depend on this magic value. I don't see any great value in breaking it. I don't support comments in my format proposal, because I don't think such metadata file really needs them, they would be easily lost after saving metadata anyway, and they would require escaping another character in file names.


(for reference only, your project, you make decision)

Strictly speaking, metastore per se is David Härdeman's project. I only maintain unofficial continuation (fork, if you prefer). I tried contacting David regarding his view of my continuation (whether it could become an officially blessed one), but I din't get any reply yet.

from metastore.

dfandrich avatar dfandrich commented on June 12, 2024

Extending the .gitmeta file format that is maintained by the setgitperms.perl script that comes standard with git (in contrib) is an obvious starting point. This format has the advantage that it would be a seamless upgrade for current setgitperms users. This format looks like this:

CMake/Utilities.cmake mode=0660 uid=1001 gid=1001
CMakeLists.txt mode=0660 uid=1001 gid=1001
COPYING mode=0660 uid=1001 gid=1001
CTestConfig.cmake mode=0660 uid=1001 gid=1001

from metastore.

przemoc avatar przemoc commented on June 12, 2024

Metastore is useful also out of git domains, so I'm not sure that taking setgitperms.perl script's .gitmeta file format is the proper way to go. It also doesn't look like space-in-filename-friendly (it's much more common to have space ( ) in filename than semicolon (;)), If you're concerned about numerical UID/GID, then having--numeric-owner like tar seems fine.

What I missed in my original suggestion is storing numerical ids next to textual ones that could be used as fallback when given user/group doesn't exist, I'll amend the issue description later. I think about putting ids in parentheses.

from metastore.

danny0838 avatar danny0838 commented on June 12, 2024

I'm currently working on git-store-meta and here's the schema I come up:

# generated by {TAB} git-store-meta {TAB} 1.1.2
<file> {TAB} <type> {TAB} <mtime> {TAB} <atime> {TAB} <mode> {TAB} <user> {TAB} <group> {TAB} <uid> {TAB} <gid>
back\\slash {TAB} f {TAB} 2015-04-20T17:00:57Z {TAB} 2015-04-20T17:03:55Z {TAB} 0664 {TAB} danny {TAB} danny {TAB} 1001 {TAB} 1001
data.txt {TAB} f {TAB} 2015-04-20T17:00:57Z {TAB} 2015-04-20T17:00:57Z {TAB} 0664 {TAB} danny {TAB} danny {TAB} 1001 {TAB} 1001
del\x7Fname {TAB} f {TAB} 2015-04-20T17:00:57Z {TAB} 2015-04-20T17:00:57Z {TAB} 0664 {TAB} danny {TAB} danny {TAB} 1001 {TAB} 1001
subdir {TAB} d {TAB} 2015-04-20T17:00:57Z {TAB} 2015-04-20T17:00:58Z {TAB} 0775 {TAB} danny {TAB} danny {TAB} 1001 {TAB} 1001
subdir/file.txt {TAB} f {TAB} 2015-04-20T17:00:57Z {TAB} 2015-04-20T17:00:57Z {TAB} 0664 {TAB} danny {TAB} danny {TAB} 1001 {TAB} 1001

Columns are variable. The first and columns always exist, while the existence and order of other columns is depending on command arguments.

File names have backslashes ("") and control chars (0x00-0x1F, 0x7F) escaped using "\x##" notation, if there's any.

If and are both provided, git-store-meta attempts to apply the user name first, and fallbacks to apply the uid if failed. / works same.

Timestamps always store the UTC time, without the fractional part of seconds.

Rows except the first two are stored sorted by UTF-8 encoding. This is primarily for the --update mechanism to work properly. Though it still works without a proper sort if the user hacks in the data.

I think this should be readible, flexible, and hackable enough. I could be wrong, though, and any feedback is welcome.

I currently don't really use metastore since I cannot get it work on MsysGit and it lacks several features I need. However it's always nice to see metastore, or maybe a "C version git-store-meta"(?) to flourish up. :)

from metastore.

JPT77 avatar JPT77 commented on June 12, 2024

Great. Thanks for the info.
If I happen to have spare time I will try it.

I started a project too (in java), but I just made it far too complex...I tried to fulfil just any possible use case.

from metastore.

przemoc avatar przemoc commented on June 12, 2024

@danny0838 Your schema doesn't seem to be good enough, because it requires some predefined (via command-line, configuration or something else) order of attributes, thus it's clunky deal. Tab is really bad space-wise separator. Metadata applying should be possible to be performed without any additional options, that's why attributes should be stored as parameter=value. I aim for conciseness, that's why I suggested one-letter parameters. At the same time, as I already explained, I think that having one parameter per line is the best for diff/merge cases. (If someone is truly worried about inefficiency here, then compact mode could be introduced and be turnable in configuration - it would make all parameters be put in one line next to filename, but I think such addition is the least important thing now.)

I think my original textual format proposal is still the best one so far. Nevertheless, configuration (#7) will be needed to land first, and to avoid stupid stuff in configuration, some other stuff has to go in even earlier, like file/dir excluding (#8, #9), as I won't ever allow to have this outrageous git option in configuration file.

(BTW Sorry for all of you hoping of quicker metastore revival, I haven't abandoned metastore, I just wasn't able to squeeze time to work on it lately. I do hope to finally push things a bit forward in May. I planned v1.1 to be released in April, but it seems it will have to wait till May.)

from metastore.

danny0838 avatar danny0838 commented on June 12, 2024

@przemoc If there's already a stored data file existed, git-store-meta will parse it and use the same fields definition if it's not given in the command line, ant thus fields definition parameters only have to be provided in the command line once (i.e. the first --store) in usual usage, which shouldn't be too annoying.

Personally I could want to store mtime only (for mtime-sensitive binary files versioning), or to store mode only or mode and mtime (for some web projects), or maybe other possible cases I haven't met. Therefore the flexibility to select which fields are to be stored is a must-have feature, at least for me.

I'm also considering adding shortcuts for some usual column packs. For example ":all" means "user,group,mode,mtime,atime", ":all2" means "uid,gid,mode,mtime,atime", and ":mm" means "mode,mtime", etc. Though this is still pending.

Just to clarify this point. I have no comment about your other concerns. It's your project, after all. :)

from metastore.

przemoc avatar przemoc commented on June 12, 2024

@danny0838 I totally agree about flexibility regarding parameters that should be stored or applied, that's why I put work-on-parameters as one of options in proposed configuration file (#7), which would be mox by default (mode, owner, xattr - these are stored already in binary format), but could be changed as user wish (configuration can be put at system, global and local level). The idea is that applying metadata would apply whatever parameters are provided within metastore file, but only within the set defined in above mentioned option. Metastore file per se is not required to have all parameters defined for all files during applying metadata. So mtime-only case will be definitely supported.

I'm wondering only, whether it would be desired to have owner, i.e. user:group as defined in my first comment, split into two parameters. As I already mentioned in one of the comments, my original suggestion lacks numerical id fallback and I think it could be provided after slash (/), i.e.

file;o=przemoc/1000:users/100;

OTOH using numerical ids only (like tar --numeric-owner) should also be possible, so flexibility may require some additional options, which should be fine as long as default behavior will be decent.

I don't like the idea of successfully changing user but failing to change group for instance. Are there any real scenarios where such ok-fail case would be still ok after all?

I don't find any compelling reason to even optionally support atime. Maybe you could provide me some?

from metastore.

danny0838 avatar danny0838 commented on June 12, 2024

@przemoc I'd just let it go if the user change succeed and the group change failed, since the user is warned for any fail.

As for atime, I personally haven't come up with a real use case, and I'm just providing it since it's easy and git-cache-meta provides it. Though it seems that several programs would look for the last access time to determine whether a file can be safely removed, as this thread tells.

from metastore.

smemsh avatar smemsh commented on June 12, 2024

Instead of your own file format, perhaps consider using YAML
to directly serialize metastore's data structures that represent
the entries. YAML has many desirable properties mentioned
earlier in the ticket and advantage of tools, syntax highlighting,
etc.

from metastore.

xkrug-bubeck avatar xkrug-bubeck commented on June 12, 2024

I just did a straight-forward textual implementation: xkrug-bubeck/metastore@e6b514b

Not really much has changed except all is text now.
And there are line endings between the files/folders and semicolons between the values.

someusr@debian:/opt/testdir$ /opt/github/metastore/bin/metastore -s
someusr@debian:/opt/testdir$ cat .metadata 
MeTaSt00r3TEXT0001
./dir_with_a_file/a_file:someusr:someusr:1478614284:198484640:33188:0:
./.metadata:someusr:someusr:1478705827:807964:33188:0:
./dir_with_a_file:someusr:someusr:1478612652:256400519:16877:0:
./empty_dir:someusr:someusr:1478612636:976399731:16877:0:
.:someusr:someusr:1478622855:166926443:16877:0:
./belongs_root_with_caps:root:root:1478612674:704401676:33188:1:security.capability:20:1:0:0:2:0:48:128:0:0:48:128:0:0:0:0:0:0:0:0:0:
./belongs_root:root:root:1478612666:152401235:33188:0:
./mnt:someusr:someusr:1478622855:166926443:16877:0:
./belongs_someusr:someusr:someusr:1478612662:552401050:33188:0:

The only downside of this at the moment: It will fail at a file that includes the separator char ":".
I personally can live with that at the moment.

Edit:

  • Merged my dev branch with my master.
    -- Should merge without conflicts.
    -- Replaced ';' with ':' as separator in regard to first posted patch.

Edit2:
Colon is a terrible separator as it is used by Debian apt. Reverted to using semicolons ';'.
bubeck@f7803c79d0421dd15685a37b1bfb7516ef499a91

from metastore.

przemoc avatar przemoc commented on June 12, 2024

Hi, Jürgen! Thanks for the contribution, but your straight-forward textual format is not what I wish for and it's not what I would like to see in metastore, therefore I cannot accept it.

But others may find it useful, so they can use the code from your repository if they find it good enough for their needs. It's (almost) always a good thing to have alternatives.

from metastore.

jirutka avatar jirutka commented on June 12, 2024

@xkrug-bubeck You can use my git-metafile instead. ;)

from metastore.

petersjt014 avatar petersjt014 commented on June 12, 2024

@przemoc Might I suggest the recutils format? It's fairly simple, and by using it we wouldn't need to create yet another textual data format (which is a bonus). Even without the recutils package installed, it can easily be manipulated in an editor (plus emacs and vim have plugins), or with sed/cut and such.

It's flexible enough that existing unix tools can be made to output it. Consider the following:

find testdir -printf 'name: %P\ntype: %y\nsize: %s\ndepth: %d\nmode: %m\ninode: %i\natime: %As\nctime: %Cs\nmtime: %Ts\n\n' > files.rec

This looks ugly, but you can run advanced queries like this:

recsel files.rec -e "name ~ '.*/foo/bar/baz-version-[12].{0,3}$' \
 && mode != 777 \
 && size >= 4096 \
 && mtime > $(date -d 2020-05-20 +%s)"

and get output like this:

name: projects/foo/bar/baz-version-2.1
type: d
size: 4096
depth: 2
mode: 755
inode: 12468250
atime: 1584162005
ctime: 1584162002
mtime: 1584162009

There are a number of other advantages too:

  • Recutils also comes with rec2csv, allowing for additional flexibility.
  • There is a type system. The type of (for example) mode could be a regex string. A mode like "999" would be detected and raise an integrity error when checked with the included tool recfix.
  • There is a constraint system availible, so recfix could detect integrity violations when (for example) two files have the same basename and depth.

from metastore.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.