Comments (6)
If I needed this today, I would try to find a way to write a preferred content expression for my local repository that covered the files I wanted to have present. Then I could use git annex get --auto
to get new versions of them after syncing. Or, I might try to reorganize my repo, so I worked on a branch that contained only the files I wanted to have present locally, and then I could git annex get
.
It would be nice if there was a haveoldversion(*) available in the preferred content language. But, it seems it would be quite expensive. It would require running git log on each file, to get past versions of the key used by the file. So I don't think that querying git for the necessary information on the fly is feasible.
What might be feasible is adding a mapping from a filename to an flag bit. git-annex get would set the flag, and git annex drop would unset it. Other updates to the tree, including git-annex sync, would not affect it.
The mapping could I suppose just be some peice of filesystem metadata for the file, or a .git/annex/blah/path/to/file, but these are pretty hacky appoaches. I'm looking at adding databases to git-annex anyway, for http://git-annex.branchable.com/design/caching_database/ , although I think that all the other use cases are of a database of information about a key, not about a file.
from datalad.
Thinking about this some more, a haveoldversion(*) would only be stable if the source of the data was originally git. Otherwise, git-annex in one repository would not be able to tell if another repository that uses that expression wanted a file or not.
Retrieving the data from git and caching it could work. Fits in more with the caching database plan too. git-annex commands like get/drop that update the location log could also update the cache, which would avoid expensive cache misses sometimes. But often enough for this to be reasonably fast? The cache should ideally also work when checking the preferred content of remotes. Maybe the filename to old version of key mapping would be the thing cached (but branches complicate this).
from datalad.
On Thu, 25 Sep 2014, Joey Hess wrote:
If I needed this today, I would try to find a way to write a preferred
content expression for my local repository that covered the files I wanted
to have present. Then I could use git annex get --auto to get new versions
of them after syncing. Or, I might try to reorganize my repo, so I worked
on a branch that contained only the files I wanted to have present
locally, and then I could git annex get.
This came up just as a perspective use-case -- no immediate (i.e. today)
resolution is really needed. But thanks for outlining workarounds -- I
didn't know about --auto option and its behavior for get operation.
It would be nice if there was a haveoldversion(*) available in the
preferred content language. But, it seems it would be quite expensive. It
would require running git log on each file, to get past versions of the
key used by the file. So I don't think that querying git for the necessary
information on the fly is feasible.
but what if there was a record of the last treeishes for a worktree
branch and a corresponding state of git-annex branch where you know that
it is in the "desired state". Then upon the "upgrade" command (via
whatever actual command it would happen) annex would check which files
have changed their availability (i.e. for which the load disappeared) --
that would not require history traversal -- and make them available
consequently updating both of those references to new treeish for that
branch and git-annex ? thinking about it though I see that it might
interfere with user-invoked 'drop' command... but may be then
reference only for the corresponding git-annex branch components of
those pairs would get updated and it would "resolve" it?
just exercising this idea -- if faulty -- just state so and ignore ;)
What might be feasible is adding a mapping from a filename to an flag bit.
git-annex get would set the flag, and git annex drop would unset it. Other
updates to the tree, including git-annex sync, would not affect it.
yeap -- sounds good to me ;)
The mapping could I suppose just be some peice of filesystem metadata for
the file, or a .git/annex/blah/path/to/file, but these are pretty hacky
appoaches. I'm looking at adding databases to git-annex anyway, for
[1]http://git-annex.branchable.com/design/caching_database/ , although I
think that all the other use cases are of a database of information about
a key, not about a file.
thanks -- I will check it out.
Yaroslav O. Halchenko, Ph.D.
http://neuro.debian.net http://www.pymvpa.org http://www.fail2ban.org
Research Scientist, Psychological and Brain Sciences Dept.
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419
WWW: http://www.linkedin.com/in/yarik
from datalad.
Yes, there is probably the possibility of putting this into git annex sync --content
as a special case, and so optimising it. Using preferred content makes the feature much more generic and widely usable though.
from datalad.
Also being discussed at http://git-annex.branchable.com/bugs/present_files__47__directories_are_dropped_after_a_sync/
from datalad.
I will close this old issue now. We do have update --reobtain-data
.
from datalad.
Related Issues (20)
- Edge case: Large datalad saves with tight ulimits on many-core machines can fail
- 1-letter shortcut for `--reobtain-data` in datalad-update HOT 1
- `str(GitTransportRI)` broken, and with it `_get_flexible_source_candidates()`
- Boto dependency HOT 1
- Extension command line argument in conflict with `datalad` level argument HOT 3
- "Convert" .travis.yml into a github workflow
- DataLad extensions are not properly registered on Python 3.12 HOT 1
- FOI: "generic" analog to WTF?
- Datalad get can't find URL despite registering via addurls (and I can see the URL with git annex whereis) HOT 21
- `create_sibling_ria` does not release `IO` handler resources properly
- MacOS tests fail to install Python 3.7 (which is EOL anyway) HOT 2
- Unable to get HCP FC datalad data in a pyenv with Python 3.12.2 ["you need 'boto' dependency which seems to be missing"] HOT 3
- Testing of authenticated S3 interactions
- Stop advertising broken `datalad -c :<key>` to unset config HOT 1
- Missing tab completions and linter help for datalad Python API HOT 2
- can deletions trigger automatic removal of downstream files? HOT 2
- question: is `datalad remove` the same as `git rm` plus `git commit`?
- Datalad should refuse certain actions in uninitialized submodules (subdatasets)
- datetime test failures on `=datalad-1.0.2` HOT 1
- Run did not save all changes HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from datalad.