Comments (14)
given that we have provided packages etc for all platforms, and all kinds of supporting infrastructure I do not see what else could we add to have this issue resolved besides encouraging to have datalad installed/shipped on BCE. Please feel free to reopen if there is anything in particular we should do more ;)
from datalad.
Especially if you can manage container layers in git annex (and so avoid duplicating them).
IIRC that is what we already have for docker images, only IIRC we have not figured out how to "link" (urls) them back to docker hub.
from datalad.
Hi @davclark thanks for getting in touch. Such "issues" are indeed best noted among datalad issues, so imho this could stay the right venue for the discussion. For the datalad "distribution" portion no heavy dependencies (probably only GitPython https://github.com/gitpython-developers/GitPython and optionally patool) probably would be necessary and we will upload to pypi whenever time comes. But it would still require git-annex installation... may be we would get insane enough to provide precooked wheels or smth which provides both datalad and git-annex binaries across all necessary platforms.
"Integration" part though might also constitute support of the Berkeley data management system you already have in place and @arokem used and reminded me about at SfN... forgot the name/url for it... What was it? ;-)
from datalad.
Here's the data management system I was telling you about (actually a
Stanford thing):
https://github.com/scitran/nims
For more context: https://scitran.stanford.edu/
On Fri, Nov 21, 2014 at 12:15 PM, Yaroslav Halchenko <
[email protected]> wrote:
Hi @davclark https://github.com/davclark thanks for getting in touch.
Such "issues" are indeed best noted among datalad issues, so imho this
could stay the right venue for the discussion. For the datalad
"distribution" portion no heavy dependencies (probably only GitPython
https://github.com/gitpython-developers/GitPython and optionally patool)
probably would be necessary and we will upload to pypi whenever time comes.
But it would still require git-annex installation... may be we would get
insane enough to provide precooked wheels or smth which provides both
datalad and git-annex binaries across all necessary platforms."Integration" part though might also constitute support of the Berkeley
data management system you already have in place and @arokem
https://github.com/arokem used and reminded me about at SfN... forgot
the name/url for it... What was it? ;-)—
Reply to this email directly or view it on GitHub
#20 (comment).
from datalad.
The berkeley crew is in Redwood, I think: http://crcns.org/
from datalad.
Oh yeah - I've used that one too :-)
On Fri, Nov 21, 2014 at 1:55 PM, Dav Clark [email protected] wrote:
The berkeley crew is in Redwood, I think: http://crcns.org/
—
Reply to this email directly or view it on GitHub
#20 (comment).
from datalad.
On Wed, 19 Nov 2014, Dav Clark wrote:
Anywho, git annex is awesome as a backend, but it's still too cumbersome
to recommend to computational scientists who aren't necessarily "committed
to the cause." I'd love to support your efforts to get more people using
git and git annex in a sensible way for science!
Hi Dav,
I have ran into your
https://github.com/dlab-berkeley/python-fundamentals to see that you
have used git-annex for one of the data files you have used in this
course... btw -- here is my collection (largely borrowed) of materials
for the course I have taught : https://github.com/dartmouth-pbs/psyc161
where I also have used git-annex for some demo datasets (haxby2001,
pymvpa tutorial). That brought me back to this elderly gh issue
But what I wanted to say: if you need recentish git-annex on your
Ubuntu/Debian VM/whatever, I now provide builds of git-annex-standalone
from neurodebian: http://neuro.debian.net/pkgs/git-annex-standalone.html
Yaroslav O. Halchenko, Ph.D.
http://neuro.debian.net http://www.pymvpa.org http://www.fail2ban.org
Research Scientist, Psychological and Brain Sciences Dept.
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419
WWW: http://www.linkedin.com/in/yarik
from datalad.
I love the way that the issue tracker here facilitates all kinds of useful tangential conversations. On that note, this is very timely, as a group of us are meeting in Berkeley (with a satellite in New York, and maybe elsewhere) to put together social science "data carpentry" materials Jul 24-25:
dlab-trainings/social-data-carpentry-2015#1
You would be most welcome to join! Are you in Cambridge or what?
from datalad.
On Mon, 13 Jul 2015, Dav Clark wrote:
us are meeting in Berkeley (with a satellite in New York, and maybe
elsewhere) to put together social science "data carpentry" materials Jul
24-25:dlab-trainings/social-data-carpentry-2015#1
You would be most welcome to join! Are you in Cambridge or what?
I am at Dartmouth College, New Hampshire... so around 3h away from
Cambridge
if someone takes care about setting up the hangout, I might well
participate to some degree ;-)
Chers!
Yaroslav O. Halchenko, Ph.D.
http://neuro.debian.net http://www.pymvpa.org http://www.fail2ban.org
Research Scientist, Psychological and Brain Sciences Dept.
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419
WWW: http://www.linkedin.com/in/yarik
from datalad.
I added your "one russian" email to the event on google calendar - there's a hangout on that event now. We'll still need to be clear about when we'll be on that hangout, but at least it's there (and I'll plan to leave that on mostly, at a minimum posting updates in the text-messaging area).
from datalad.
Hey @davclark -- just ran into this issue and wondered what is the status of your endeavors with BCE?
Locally at Dartmouth our CE consists for many of heudiconv to autoconvert DICOMs into BIDS DataLad datasets, datalad (git/git-annex) available around, and singularity containers (now more often also under datalad/git-annex) so there is modularity and clean control over all versions and more often now environments which are used. Also more and more datalad run
to record what was done (see also recently merged enhanced rerun
functionality: #2076)
from datalad.
Good timing!
While I left Berkeley a year and a half ago, I can report on the status there. It seems the community has standardized on data science docker containers. These have been a point of collaboration for a pretty broad set of researchers These can be delivered via Jupyter Hub (using Kubernetes for scale) and this is done for the intro to data science courses now for thousands of students.
Also at UC Berkeley are several folks working on the rocker stacks. Again, using Docker to deliver standardized R environments.
Personally, I'm just ramping up on some projects in the same vein, but nothing that's ready to share yet!
from datalad.
As for containers, I love docker but I love singularity for anything "computation oriented" since
- it is a solution created with HPC in mind
- we have https://github.com/datalad/datalad-container/ datalad extension now to assist with "standardizing" management of containers within DataLad framework
- we have https://github.com/ReproNim/containers with popular neuroimaging images pre-populated, and a shim script to a) assure execution in isolated environment b) shim to run singularity via Docker where needed/possible (OSX)
from datalad.
Hey @yarikoptic!
I am currently "all in" on Docker and haven't had much time for Singularity. Specifically, I currently work at Gigantum where we're focused on automating Docker and Git for mostly single-machine workflows.
I'd talked to Satra about ReproNim a while ago, and I think there's a fundamental challenge in that ReproNim needs to be owned by the folks getting grants, and Gigantum is charting a course towards a revenue based model.
That said, your model of managing containers in git-annex seems sensible. Especially if you can manage container layers in git annex (and so avoid duplicating them).
And, we have the goal of a generic dataset where community drivers can be installed as well... so supporting datalad / git-annex is still in mind. We don't have the bandwidth to drive this right now, but hopefully later this year we'll have that interface clarified and easier to work with.
Anyway, glad you're keeping me and other members of the BCE project up to date! I wonder if there's a more discoverable place to have this conversation? But for now it works to get me to read it at least :P
from datalad.
Related Issues (20)
- Install datalad by easybuild HOT 1
- datalad update fails randomly with error: "cannot lock ref 'refs/remotes/origin/master'" and ".... git-annex" HOT 1
- Github tarball checksums changed HOT 2
- Different HPC systems and users HOT 2
- Add ability to limit get (and thus install) --recursive installation of subdatasets
- Edge case: Large datalad saves with tight ulimits on many-core machines can fail
- 1-letter shortcut for `--reobtain-data` in datalad-update HOT 1
- `str(GitTransportRI)` broken, and with it `_get_flexible_source_candidates()`
- Boto dependency HOT 1
- Extension command line argument in conflict with `datalad` level argument HOT 3
- "Convert" .travis.yml into a github workflow
- DataLad extensions are not properly registered on Python 3.12 HOT 1
- FOI: "generic" analog to WTF?
- Datalad get can't find URL despite registering via addurls (and I can see the URL with git annex whereis) HOT 21
- `create_sibling_ria` does not release `IO` handler resources properly
- MacOS tests fail to install Python 3.7 (which is EOL anyway) HOT 2
- Unable to get HCP FC datalad data in a pyenv with Python 3.12.2 ["you need 'boto' dependency which seems to be missing"] HOT 3
- Testing of authenticated S3 interactions
- Stop advertising broken `datalad -c :<key>` to unset config HOT 1
- Missing tab completions and linter help for datalad Python API HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from datalad.