Giter Club home page Giter Club logo

Comments (14)

yarikoptic avatar yarikoptic commented on May 30, 2024 2

given that we have provided packages etc for all platforms, and all kinds of supporting infrastructure I do not see what else could we add to have this issue resolved besides encouraging to have datalad installed/shipped on BCE. Please feel free to reopen if there is anything in particular we should do more ;)

from datalad.

yarikoptic avatar yarikoptic commented on May 30, 2024 1

Especially if you can manage container layers in git annex (and so avoid duplicating them).

IIRC that is what we already have for docker images, only IIRC we have not figured out how to "link" (urls) them back to docker hub.

from datalad.

yarikoptic avatar yarikoptic commented on May 30, 2024

Hi @davclark thanks for getting in touch. Such "issues" are indeed best noted among datalad issues, so imho this could stay the right venue for the discussion. For the datalad "distribution" portion no heavy dependencies (probably only GitPython https://github.com/gitpython-developers/GitPython and optionally patool) probably would be necessary and we will upload to pypi whenever time comes. But it would still require git-annex installation... may be we would get insane enough to provide precooked wheels or smth which provides both datalad and git-annex binaries across all necessary platforms.
"Integration" part though might also constitute support of the Berkeley data management system you already have in place and @arokem used and reminded me about at SfN... forgot the name/url for it... What was it? ;-)

from datalad.

arokem avatar arokem commented on May 30, 2024

Here's the data management system I was telling you about (actually a
Stanford thing):

https://github.com/scitran/nims

For more context: https://scitran.stanford.edu/

On Fri, Nov 21, 2014 at 12:15 PM, Yaroslav Halchenko <
[email protected]> wrote:

Hi @davclark https://github.com/davclark thanks for getting in touch.
Such "issues" are indeed best noted among datalad issues, so imho this
could stay the right venue for the discussion. For the datalad
"distribution" portion no heavy dependencies (probably only GitPython
https://github.com/gitpython-developers/GitPython and optionally patool)
probably would be necessary and we will upload to pypi whenever time comes.
But it would still require git-annex installation... may be we would get
insane enough to provide precooked wheels or smth which provides both
datalad and git-annex binaries across all necessary platforms.

"Integration" part though might also constitute support of the Berkeley
data management system you already have in place and @arokem
https://github.com/arokem used and reminded me about at SfN... forgot
the name/url for it... What was it? ;-)


Reply to this email directly or view it on GitHub
#20 (comment).

from datalad.

davclark avatar davclark commented on May 30, 2024

The berkeley crew is in Redwood, I think: http://crcns.org/

from datalad.

arokem avatar arokem commented on May 30, 2024

Oh yeah - I've used that one too :-)

On Fri, Nov 21, 2014 at 1:55 PM, Dav Clark [email protected] wrote:

The berkeley crew is in Redwood, I think: http://crcns.org/


Reply to this email directly or view it on GitHub
#20 (comment).

from datalad.

yarikoptic avatar yarikoptic commented on May 30, 2024

On Wed, 19 Nov 2014, Dav Clark wrote:

Anywho, git annex is awesome as a backend, but it's still too cumbersome
to recommend to computational scientists who aren't necessarily "committed
to the cause." I'd love to support your efforts to get more people using
git and git annex in a sensible way for science!

Hi Dav,

I have ran into your
https://github.com/dlab-berkeley/python-fundamentals to see that you
have used git-annex for one of the data files you have used in this
course... btw -- here is my collection (largely borrowed) of materials
for the course I have taught : https://github.com/dartmouth-pbs/psyc161
where I also have used git-annex for some demo datasets (haxby2001,
pymvpa tutorial). That brought me back to this elderly gh issue

But what I wanted to say: if you need recentish git-annex on your
Ubuntu/Debian VM/whatever, I now provide builds of git-annex-standalone
from neurodebian: http://neuro.debian.net/pkgs/git-annex-standalone.html

Yaroslav O. Halchenko, Ph.D.
http://neuro.debian.net http://www.pymvpa.org http://www.fail2ban.org
Research Scientist, Psychological and Brain Sciences Dept.
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419
WWW: http://www.linkedin.com/in/yarik

from datalad.

davclark avatar davclark commented on May 30, 2024

I love the way that the issue tracker here facilitates all kinds of useful tangential conversations. On that note, this is very timely, as a group of us are meeting in Berkeley (with a satellite in New York, and maybe elsewhere) to put together social science "data carpentry" materials Jul 24-25:

dlab-trainings/social-data-carpentry-2015#1

You would be most welcome to join! Are you in Cambridge or what?

from datalad.

yarikoptic avatar yarikoptic commented on May 30, 2024

On Mon, 13 Jul 2015, Dav Clark wrote:

us are meeting in Berkeley (with a satellite in New York, and maybe
elsewhere) to put together social science "data carpentry" materials Jul
24-25:

dlab-trainings/social-data-carpentry-2015#1

You would be most welcome to join! Are you in Cambridge or what?

I am at Dartmouth College, New Hampshire... so around 3h away from
Cambridge

if someone takes care about setting up the hangout, I might well
participate to some degree ;-)

Chers!

Yaroslav O. Halchenko, Ph.D.
http://neuro.debian.net http://www.pymvpa.org http://www.fail2ban.org
Research Scientist, Psychological and Brain Sciences Dept.
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419
WWW: http://www.linkedin.com/in/yarik

from datalad.

davclark avatar davclark commented on May 30, 2024

I added your "one russian" email to the event on google calendar - there's a hangout on that event now. We'll still need to be clear about when we'll be on that hangout, but at least it's there (and I'll plan to leave that on mostly, at a minimum posting updates in the text-messaging area).

from datalad.

yarikoptic avatar yarikoptic commented on May 30, 2024

Hey @davclark -- just ran into this issue and wondered what is the status of your endeavors with BCE?
Locally at Dartmouth our CE consists for many of heudiconv to autoconvert DICOMs into BIDS DataLad datasets, datalad (git/git-annex) available around, and singularity containers (now more often also under datalad/git-annex) so there is modularity and clean control over all versions and more often now environments which are used. Also more and more datalad run to record what was done (see also recently merged enhanced rerun functionality: #2076)

from datalad.

davclark avatar davclark commented on May 30, 2024

Good timing!

While I left Berkeley a year and a half ago, I can report on the status there. It seems the community has standardized on data science docker containers. These have been a point of collaboration for a pretty broad set of researchers These can be delivered via Jupyter Hub (using Kubernetes for scale) and this is done for the intro to data science courses now for thousands of students.

Also at UC Berkeley are several folks working on the rocker stacks. Again, using Docker to deliver standardized R environments.

Personally, I'm just ramping up on some projects in the same vein, but nothing that's ready to share yet!

from datalad.

yarikoptic avatar yarikoptic commented on May 30, 2024

As for containers, I love docker but I love singularity for anything "computation oriented" since

  • it is a solution created with HPC in mind
  • we have https://github.com/datalad/datalad-container/ datalad extension now to assist with "standardizing" management of containers within DataLad framework
  • we have https://github.com/ReproNim/containers with popular neuroimaging images pre-populated, and a shim script to a) assure execution in isolated environment b) shim to run singularity via Docker where needed/possible (OSX)

from datalad.

davclark avatar davclark commented on May 30, 2024

Hey @yarikoptic!

I am currently "all in" on Docker and haven't had much time for Singularity. Specifically, I currently work at Gigantum where we're focused on automating Docker and Git for mostly single-machine workflows.

I'd talked to Satra about ReproNim a while ago, and I think there's a fundamental challenge in that ReproNim needs to be owned by the folks getting grants, and Gigantum is charting a course towards a revenue based model.

That said, your model of managing containers in git-annex seems sensible. Especially if you can manage container layers in git annex (and so avoid duplicating them).

And, we have the goal of a generic dataset where community drivers can be installed as well... so supporting datalad / git-annex is still in mind. We don't have the bandwidth to drive this right now, but hopefully later this year we'll have that interface clarified and easier to work with.

Anyway, glad you're keeping me and other members of the BCE project up to date! I wonder if there's a more discoverable place to have this conversation? But for now it works to get me to read it at least :P

from datalad.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.