datalad / datalad Goto Github PK
View Code? Open in Web Editor NEWKeep code, data, containers under control with git and git-annex
Home Page: http://datalad.org
License: Other
Keep code, data, containers under control with git and git-annex
Home Page: http://datalad.org
License: Other
Primarily came as a use case with extracting load from archives. If multiple files need to be "fetched" from e.g. tarball, it would be a costly operation to request one file at a time.
I have no clue if that is a reasonable thing to request from git-annex -- marking as such for now. Otherwise we might workaround simply by keeping/caching extracted archives locally for some duration (of 'get' command, or timing out), thus allowing for a simple cp/ln operation from that pool.
Reviewing paper for PLOS and annoying thing is that it has plenty of "Supplementary" files which are linked in PDF but even with acroread -- doesn't go to any of those URLs for some reason... so here came ugly in implementation but beautiful in the result workaround
git init; git annex init; strings ../PONE-D-XXXX.pdf | grep URI.*editorialmanager | sed -e 's,.*/URI(\(http://www.edit.*\))>>.*,\1,g' | while read link; do fn=$(~datalad/datalad/tools/urlinfo -f $link); git annex addurl --pathdepth=1 --file=$fn $link; done
urlinfo was necessary because of #37 .
So I thought it might provide an interesting useful use-case for the crawler, where "provider" is a PDF document (instead of e.g. a website) -- the rest goes the same. Now I have fetched some tarballs to be extracted etc
A lovely annexificator already exists:
https://github.com/detrout/encode-annex
so worth considering this use-case as well
just an idea
e.g. aria2c can download a file from multiple urls at once. This can mitigate limited/congested bandwidth
It would be nice if datalad, or even better with native git-annex support, could allow/try to get/download content from multiple urls at once (if multiple URLs assigned).
Related discussions/posts on git-annex project pages:
"Downloading files from multiple git-annex sources simultaneously"
http://git-annex.branchable.com/todo/Bittorrent-like_features/#index1h1
takes a while on each clone etc.
as I have hinted in the mail should just provide minimalistic view of that repository with only files necessary for the (unit)test
ATM it is just output of wget, so screen gets filled up with all kinds of stuff quite quickly ruining some times precious terminal history . In the light of #10 it might be worth considering making default fetching of data less verbose and more or less 'Ok' or 'Fail' status being reported. Also I somewhat like how docker manages to report its progress in the terminal (curses?) where multiple lines show progress and not that much of terminal screen is wasted.
Aspera is a company (http://asperasoft.com , apparently belonging to IBM now) developing products for the high-performance transfer -- it intends to fill up the pipe even while going through the WAN . Currently used by NIH (for NCBI) and HCP.
Pros:
Cons:
Theoretically nothing forbids supporting it, but logistically it might be very messy (see cons).
> git clone --recurse git://github.com/yarikoptic/datalad -b nf-test-repos
...
> cd datalad/datalad/tests/testrepos/basic/r1
> ls -l test-annex.dat
lrwxrwxrwx 1 yoh yoh 186 Feb 23 15:56 test-annex.dat -> .git/annex/objects/zk/71/SHA256E-s4--181210f8f9c779c26da1d9b2075bde0127302ee0e3fca38c9a83f5b1dd8e5d3b.dat/SHA256E-s4--181210f8f9c779c26da1d9b2075bde0127302ee0e3fca38c9a83f5b1dd8e5d3b.dat
> git co master
Switched to branch 'master'
$> git annex get test-annex.dat
(merging origin/git-annex into git-annex...)
(recording state in git...)
get test-annex.dat (from web...)
../../../../../.git/modules/datalad/te 100%[=============================================================================>] 4 --.-KB/s in 0s
ok
(recording state in git...)
# indeed upstairs
$> ls -l ../../../../../.git/modules/datalad/tests/testrepos/modules/basic/r1/annex/objects/zk/71/SHA256E-s4--181210f8f9c779c26da1d9b2075bde0127302ee0e3fca
38c9a83f5b1dd8e5d3b.dat/SHA256E-s4--181210f8f9c779c26da1d9b2075bde0127302ee0e3fca38c9a83f5b1dd8e5d3b.dat
-r-------- 1 yoh yoh 4 Feb 23 15:57 ../../../../../.git/modules/datalad/tests/testrepos/modules/basic/r1/annex/objects/zk/71/SHA256E-s4--181210f8f9c779c26da1d9b2075bde0127302ee0e3fca38c9a83f5b1dd8e5d3b.dat/SHA256E-s4--181210f8f9c779c26da1d9b2075bde0127302ee0e3fca38c9a83f5b1dd8e5d3b.dat
$> acpolicy git-annex
git-annex:
Installed: 5.20150205+git57-gc05b522-1~nd80+1
Candidate: 5.20150205+git57-gc05b522-1~nd1
Version table:
*** 5.20150205+git57-gc05b522-1~nd80+1 0
500 http://neuro.debian.net/debian-devel/ jessie/main amd64 Packages
100 /var/lib/dpkg/status
$> acpolicy git
git:
Installed: 1:2.1.4-2.1
Candidate: 1:2.1.4-2.1
Version table:
1:2.1.4+next.20141218-2 0
300 http://http.debian.net/debian/ experimental/main amd64 Packages
*** 1:2.1.4-2.1 0
900 http://http.debian.net/debian/ jessie/main amd64 Packages
600 http://http.debian.net/debian/ sid/main amd64 Packages
100 /var/lib/dpkg/status
I found my way here via the software carpentry website. I don't know a better way to generically contact folks in the project, but feel free to be in touch via email (my github username @berkeley.edu).
We're working on a project to create a common baseline of packages at UC Berkeley called BCE (the Berkeley Common Environment). The tools you are building sound like exactly the direction we'd like to be going in for data management. So, "integrate" here is not a heavy demand, just making sure BCE supports the python packages, etc. that you're relying on.
Note that while we build ubuntu VMs, we don't use dpkg/apt to manage most of our python dependencies (there needs to be a strong reason to do so). We use pip. No slight intended against neurodebian, etc. - but this way you can still easily install the BCE python dependencies on, e.g., OS X. This is our current list of dependencies.
Anywho, git annex is awesome as a backend, but it's still too cumbersome to recommend to computational scientists who aren't necessarily "committed to the cause." I'd love to support your efforts to get more people using git and git annex in a sensible way for science!
It'd be great to get this integrated in BCE. Then folks could very easily learn how to use these tools using VirtualBox, or EC2, or Docker, or whatever else we end up supporting.
cc @aculich
as discovered with psychoinformatics-de/studyforrest-www#5
originally had read/write for owner/group, was pushed to public server with that group missing
I guess ideally git-annex could have an option for "public" publishing of the annex?
As HCP and NCANDA (via @nicholsn with 10mil files) shows, it might be in-feasible to aim supporting shipping the entire beast in a single git/git-annex repository, which suggests modularizing at least at the level of a single subject (or may be subject/modality or subject/visit) to be managed with e.g. git submodules or some "native" datalad distribution way. But then such features as "views" of git-annex do not want to be lost since they provide really nice functionality to reexpose that data in an analysis specific convenient layout (given the meta-data is contained within annex). So this all requires additional thought on how to go about.
@joeyh -- any ideas on how to go about growing large repositories partitioning/handling?
With pluggable (similar to how git does) comparison tools... reviving an old dialog in Jan 2014 which finished on [email protected]
See http://newsoffice.mit.edu/2014/whos-using-your-data-httpa-0613
Something to know about/keep in mind
as far as I see it on quick code grep or on this test -- doesn't have an option to use those:
$> mkdir test; cd test; git init; git annex init; git annex addurl --pathdepth=-1 http://human.brain-map.org/api/v2/well_known_file_download/157722290
Initialized empty Git repository in /tmp/test/.git/
init ok
(Recording state in git...)
addurl 157722290 (downloading http://human.brain-map.org/api/v2/well_known_file_download/157722290 ...)
/tmp/test/.git/annex/tmp/URL--http&c%%huma [ <=> ] 3.07M 391KB/s in 8.3s
ok
(Recording state in git...)
$> ls
157722290@
$> ~datalad/datalad/tools/urlinfo http://human.brain-map.org/api/v2/well_known_file_download/157722290
URL: http://human.brain-map.org/api/v2/well_known_file_download/157722290
Date: Wed, 21 Jan 2015 18:04:35 GMT
Server: Apache
Content-Disposition: attachment; filename="T1.nii.gz"
Content-Transfer-Encoding: binary
Access-Control-Allow-Origin: *
Access-Control-Allow-Methods: POST, GET, OPTIONS
Access-Control-Max-Age: 1728000
Cache-Control: private
X-UA-Compatible: IE=Edge,chrome=1
X-Request-Id: aa0a12faabba30384fc13fc9fecda5e2
X-Runtime: 0.007028
X-Rack-Cache: miss
Status: 200 OK
Content-Type: application/octet-stream
Vary: Accept-Encoding
Content-Encoding: gzip
Connection: close
Transfer-Encoding: chunked
Set-Cookie: BIGipServerHuman_Pool=2502510346.20480.0000; path=/
In PyMVPA we have this handy function which reports various information about code and externals, which helps to troubleshoot. It might be nice to adopt similar one here, but ideally probably through generalization of mvpa2.wtf (and externals etc) functionality in a separate project... Currently projects need to brew similar but different functionality for the same purpose. pandas and stats models has some already as well.
one of the problems many labs might have is limited if any offsite backup of their data. Given large amounts of processed etc data, it might be worth having a mode where we "publish" repositories to e.g. external hard drive copying only "precious" data (e.g. original raw/preprocessed). git-annex has notion of "preferred content" but it is assigned per "source" repository where we would need something like "preferred content to publish to X" or just declare some files/directories "precious" (may be just a tag) Then it would be useful to allow incremental updates of backups on external drive stating e.g. "publish --to=/media/drive --tags=precious"
@joeyh - what would be your thoughts on how to wrap it up? tags?
That could allow use of largefiles or whatever else to simplify interactions with annex -- they would just automagically added to annex (instead of index) in 'pre-add' hook, and/or may be user be alerted that he is trying to commit a large file directly to git, while it is not matching largefiles selection, ...
Or am I dreaming @joeyh? ;-)
Imagine an annex repository which had some files' content fetched (via 'annex get'). Then modifications to some of those and may be other files were done in the original repository, so if a plain "git pull" is performed, symlinks to new content would get broken and would require manual "git annex get" on those files which previously were "get"ed at some point in the past. AFAIK (and from our discussion on IRC, cited below, thanks patagonicus and scalability-junk for discussion) there is no way currently to achieve that without additional script on top in two possible ways:
FWIW -- here is a protocol of IRC session
10:05 yoh: is there a way to 'upgrade' the content which was 'got' already? e.g. I have some files which got their
content delivered to local annex. then I want to 'git pull' + 'git annex get
those_files_for_which_I_had_content'
10:07 patagonicus: yoh: Probably git annex merge, if pull already got all the branches it needs. No need to do an
extra get, if the data was already copy --to the repo (merge will update your master branch so
that the new file's symlinks appear and the link target will already be there)
10:08 warp: just do git annex sync?
10:09 scalability-junk: yeah not sure why someone would manually git pull and push with git annex sync.
10:09 bremner: sync either gets all content or none
10:09 scalability-junk: bremner: nope it gets all metadata aka branches and git stuff
10:09 scalability-junk: content is only synced with git annex sync --content
10:09 bremner: yes, and then it syncs all content, like I just said
10:09 scalability-junk: and even then it's not all content, but the content which is preferrer or has not
enough duplicates
10:10 patagonicus: bremner: No, it's based on preferred content
10:10 scalability-junk: *preferred
10:10 bremner: ok, fine. It _still_ doesn't answer yoh's question.
10:10 patagonicus: But sync does push/pull all branches, so if you want to only sync some. And if you want
to rewrite history you'll probably have to fetch/push --force and do some resets.
10:11 scalability-junk: rewriting history is a pain
10:11 yoh: keep also in mind that sync tries to be bidirectional -- I want a clean one-way.
10:13 scalability-junk: Wasn't said. Alright so yeah git merge probably
10:18 yoh: checked -- nope -- merge didn't get it and there is no --content for it
10:19 patagonicus: yoh: Are the symlinks there? And you already pushed that file with git annex copy --to
from a different repo?
10:19 yoh: I am just trying with two local repos I made for testing... there is no need to copy --to
10:20 patagonicus: Eh … then how was the content "'got' already"?
10:20 yoh: ok - 1 sec
10:23 yoh: eh -- history a bit messy to share... s215bLgyFs
10:23 yoh: http://slexy.org/view/s215bLgyFs
10:23 yoh: so this way ;) after initial clone I 'get' some interesting file. then if they get modified
in origin, I would like to "update" them locally as well, but only them
10:24 yoh: merge itself doesn't even pull... may be I should fetch before and then merge,... let's see
10:24 patagonicus: Yeah, merge does not do any fetch/pull/push. It just uses the synced/* branches that
are available to update master and git-annex
10:25 patagonicus: You have no line that would transfer the "123" content from d1 to d2, so it's not there.
10:26 patagonicus: You'll have to run a git annex get afterwards. Or add d2 as a remote to d1 and run git
annex copy --to=d2 at any time.
10:26 yoh: I understand that... and that is what I would like to achieve -- that some command does that
content transfer for those files which had local content before
10:26 patagonicus: So, basically git annex sync --content without pushing to the remote?
10:26 yoh: sync would sync/get all the datafiles
10:27 yoh: I would like only those present on client already (server shouldn't be aware of the client, so
no copy --to)
10:28 patagonicus: What do you mean with present? Based on the file names or on the content? Because in
your example you have 1 file but two contents (so two different keys in annex' storage.
One content will be present, one (the 123 one) will not).
10:29 yoh: on the file names
10:30 patagonicus: Oh. That's going to be complicated, I think. So if there's two files, A and B, and the
client has the content of A and both A and B get changed on the server, only the new
content for A should be made available on the client?
10:30 yoh: yes
10:31 yoh: I guess could also be done via retrospection in git/git-annex
10:33 patagonicus: There's no such feature built into annex at the moment (and I'm not sure if it ever
will be added). Basically what you want to do is: run git annex find (which will list
all files currently in the repo, probably with --print0), run git annex merge to update
master, the run git annex get with the result of the find you previously did. That
wouldn't catch file renaming, but for the basic case
10:34 patagonicus: it should work. Needs N*M space, though, where N is the number of files in the local
repo and M is the average file name length.
10:34 patagonicus: I think git annex find --print0 >here && git annex fetch -a && git annex merge && xargs
-0 git annex get <here or something like that should work.
10:35 patagonicus: Will break if there's ever a merge in which file content was changed that is done
without the find/get pair.
10:35 yoh: yeap, something like that
10:36 yoh: because of such corner-cases I think it might be better to do it (optionally?) via full
retrospection -- if each file without content ever had content before and was not explicitly
dropped
10:37 patagonicus: Then the runtime would depend on the number of commits on master (times the number of files).
10:37 patagonicus: Should be doable with a bit of scripting, though.
10:38 patagonicus: The "explicetly" dropped would be harder. Basically you could only find out if any of
the previous versions of a file is still in the local repo. However, if you keep
running git annex unused && git annex dropunused all those would never be in the local
repo.
10:39 yoh: patagonicus: yes, I know. But such full retrospection could then be optional for explicit run
and/or called if manual operation was detected... may be there could even be some record of a
last state when things were "in order" to start from that point
I wondered if there is a convenience construct already which would allow to drop (may be forcefully) some files which are no longer "of interest".
use case: carrying out an analysis while keeping results under git-annex control. With reiterations of the analyses a pile of results accumulates but may be majority of them is not even worth keeping at all (not just locally). It would be nice to have ability to drop all the load which e.g. is not referenced by any commit after X (in a given branch, situation with multiple branches might be trickier)
======================================================================
FAIL: Verify that all our repos are clonable
----------------------------------------------------------------------
Traceback (most recent call last):
File "C:\Python27\lib\site-packages\nose\case.py", line 197, in runTest
self.test(*self.arg)
File "c:\buildslave\datalad-tests-virtualbox-dl-win7-64\build\datalad\tests\utils.py", line 349, in newfunc
t(repo, *arg, **kw)
File "c:\buildslave\datalad-tests-virtualbox-dl-win7-64\build\datalad\tests\utils.py", line 239, in newfunc
t(*(arg + (filename,)), **kw)
File "c:\buildslave\datalad-tests-virtualbox-dl-win7-64\build\datalad\tests\test_testrepos.py", line 38, in test_clone
eq_(status, 0, msg="Status: %d Output was: %r" % (status, output))
AssertionError: Status: 1 Output was: "'{' is not recognized as an internal or external command,\noperable program or batch file.
Message seems to be way off for figuring out WTF. Reference:
http://smaug.datalad.org:8020/builders/datalad-tests-virtualbox-dl-win7-64/builds/3/steps/nosetests/logs/stdio
http://xnat.org is the most widely used FOSS platform for neuroimaging data management and distribution. Used by openfmri, nitrc, humanconnectome, etc.
Pros:
TODOs:
http://coins.mrn.org is a FOSS-wannabe platform used by many researchers.
TODOs
Issue to be closed during #closember 2021 at the latest.
All recent build seems to get stuck at
https://travis-ci.org/datalad/datalad/builds/37287440#L195
2014-10-07 13:25:49,086 [DEBUG ] Running: cd /tmp/tmpDMOIrj && git annex add -c annex.alwayscommit=false "files/1" (cmd.py:100)
and since we are 'sponging' all output out before we can see it, adding --debug to git annex call didn't help to provide more information....
any ideas on how to troubleshoot this one @datalad/developers ?
This is just a placeholder to point to the problem of failing tests (such as #67) on Windows. Some of them are due to max path length limit (#58) but there might be more. Just requires a thorough pass
Reference: http://smaug.datalad.org:8020/builders/datalad-tests-virtualbox-dl-win7-64/builds/3/steps/nosetests/logs/stdio
now it is USER@HOST:PATH
I am not exactly sure what the right solution would be. If nipype's DataGrabber/Finder discovers a dead symlink it should at least complain/warn. But apart from that I am not certain whether it should refuse to run if in the matched set is a dead symlink, or whether it should exclude them, because they are practically not present.
Just a topic for possible discussion etc
Just for the sake of seeing how big would get a really big dataset, I am "simulating" an annex repository with all the files distributed by HCP 500 subjects release, which, as deposited to S3, has >5,600,000 files. It seems to be running for a while now ;)
Some of the initial bottlenecks I have detected:
The script I am using is here: https://github.com/datalad/datalad/blob/master/tools/mimic_repo (really not sure why I did it in python instead of a simple bash script). and here are the corresponding 's3cmd ls' outputs I am running on:
http://www.onerussian.com/tmp/hcp-ls.20141020-500subject-only.txt.gz
or only just a 1000 (to give a try)
http://www.onerussian.com/tmp/hcp-ls.20141020-500subject-only1000.txt.gz
so you could run e.g. with
./mimic_repo hcp-ls.20141020-500subject-only1000.txt.gz /tmp/testtt
Provides portal to upload software/data with versioning etc. See
https://zenodo.org/search?p=980__a:dataset
so might be useful to keep in mind at least for "crawling" and exposing via annex, ideally -- for publishing to ;)
During SfN Jeff Teeters shared good news on available of alternative methods of downloading datasets from the crcns.org which should support versioning etc, research more in detail:
http://crcns.org/download
probably by taking *_kwargs and pop'ing only a limited set of arguments, passing *args and the rest of *_kwargs deeper inside
test also by nesting multiple with_tempfile's
when created http://github.com/datalad/brainfacts--2012-edition index for git-annex branch with all the urls was left uncommitted
not neuroimaging but a good/interesting use-case (versioned files via suffixes) available via aspera or S3 (bucket without versioning. versioning was not enabled even though advised in private correspondence). Locations: http://ftp.ncbi.nlm.nih.gov/1000genomes/ and s3://1000genomes
Contains only 462826 (including versioned copies) of files on s3 atm.
a really wild question: to provide access to S3 buckets not exposed publicly (thus requiring authentication via IAM credentials, such as HCP) and through the prefix/revisionids, instead of just to a pure keystore (as in native git-annex S3 special remote), we would need to provide an external special remote which would need again to implement S3 client capabilities for basic authentication/fetching. It can be done but we will grow dependencies implementing the same functionality as what git-annex already has inside.
So a wild thought came: may be git-annex could somehow expose S3 functionality to e.g. external special remotes through some API? ;) could well be made into something even more generic. git-annex has already capabilities for talking to a plethora of data hosting providers and thus may be could serve as a glorified "ultimate downloader"? or to say may be to expose itself as yet another special remote capable of TRANSFER?
As we originally planned, we would need custom helpers to fetch data from e.g. image databases neither exposing their data via HTTP, nor providing a universal "key store" like facility, so prohibiting development of dedicated special remote handlers for them.
One way to mitigate this would be to have in git-annex some way to specify custom "downloaders" given a specific URI pattern. E.g. for URL regexp "http.*.torrent" still use built-in wget/curl for checking content presence and aria2c %(url)s
for fetching the content. Such custom downloaders would also have their registry of authorization credentials for data sources requiring them. Also we could support extraction from locally present archives, via custom downloaders, see e.g.
Line 277 in 54b71be
If support of custom downloaders would not be implemented directly in git-annex, it might be implementable by datalad in many cases by "proxying" calls to wget and/or curl and then checking/fetching content accordingly.
Currently annex only has support for regular wget/curl and quvi (for youtube videos), with hardcoded logic.
Related:
http://www.cs.cmu.edu/afs/cs/project/theo-73/www/science2008/data.html
a collection of .mat files + scripts in .tar.gz
just thought it might be nice to re-distribution it thus decided to issue it here
By now this intended to be a reminder to figure things out.
Just stumbled upon a case where I git annex init a repo and then got "init ok" on stdout, "fatal: ref HEAD is not a symbolic ref" on stderr and exitcode 0 (or at least None).
So, datalad didn't notice something went wrong. Needs further investigation.
https://openfmri.org is a popular NSF-funded project to federate a rich collection of neuroimaging data from cognitive experiments.
Pros:
Cons:
Use case, again not neuroimaging but interesting, brought up by Don Armstrong
http://www.ncbi.nlm.nih.gov/sra
sra-toolkit providing tools for conversion and visualization is in Debian
Initial use was to link load from incoming to public repositories...
to mitigate limitation on the maximum path length (up to 260 chars total)
echoing discussion we are having in #17, I wanted to also bring one about non-direct (i.e. regular/classical) annex'es and their impact. As http://git-annex.branchable.com/internals/ and http://git-annex.branchable.com/internals/lockdown/ outlines, to prevent accidental removal of the file, every file is placed under .git/annex/objects/aa/bb/KEY/KEY
directory. where for KEY directory 'w' permission is taken away so the KEY file can't be removed. It indeed solves the accidental removal issue, but
There was a recent comment http://git-annex.branchable.com/internals/lockdown/#comment-f77526824d026f213ea98939fda9ac4c possibly on linux specific way, but I wondered: Why just not to take 'w' permission away from .git/annex/objects/aa/bb
directories, and store files under. Yes -- for any get/drop that directory permission would need to flip to being writable but with proper guards around, and fsck checking, (or just adding some lock file which if not removed would signal that one of the underlying directories might have been left writable) I bet it should be feasible to make it work quite reliably without sacrificing FS meta-information real estate/performance?
Now that annex got built-in support for torrents (http://git-annex.branchable.com/devblog/day_239-240__bittorrent_remote/) we might consider exposing torrents from http://academictorrents.com/browse.php?cat=6 Chris's test-retest and few other related datasts (http://academictorrents.com/browse.php?search=fmri&c6=1) are available there
If I have a big dataset and I want to do a number of analyses with multiple users in a shared computing environment, it would be nice to be able to do that in a way that has minimal impact on storage demands.
In my concrete case, I have about 100GB of raw input data that is currently copied for each clone of the dataset handle. Of course I can avoid that by manually hard/soft linking the relevant files, and only later inject potential results of derived data into the original annex. However, this breaks the workflow with dataset handles.
Not really an issue, just a comment on your top-level readme:
It is currently in a "prototype" state, i.e. a mess.
that's classic!
NB I thought I have filed it here but it must have been elsewhere, so backlink will be provided later
git annex view
provides great functionality to take advantage of meta-data (tags) associated with data files for custom views. One of the stumble points might often be "installation" of laarge datasets where only a handful of files are actually needed/used. ATM it results in a directory hierarchy where possibly majority of files are broken links, which makes navigation difficult and non-productive.
It would be great if there was a way to generate a "lean" view where only files with content available are visible.
Somewhat related would be #6, i.e. to carry an update while maintaining a lean view
It is often needed to provide anonymous access to data (or a subset) for peer-review. We should have (at least) some documentation on how to achieve this with datalad.
now if we run
$> nosetests -s -v datalad/tests/
datalad.tests.test_annexrepo.test_AnnexRepo_instance_brand_new ... ok
datalad.tests.test_annexrepo.test_AnnexRepo_instance_from_clone ... 2015-02-25 13:19:13,261 [ERROR ] 'git clone -v /home/yoh/proj/datalad/datalad/datalad/tests/testrepos/basic/r1 /home/yoh/.tmp/tmptoMerM' returned with exit code 128
| stderr: 'fatal: destination path '/home/yoh/.tmp/tmptoMerM' already exists and is not an empty directory.
| ' (gitrepo.py:59)
ok
...
we get them spit out to the screen, which is not proper since they are intended to error out at that point (right @bpoldrack ?). so we need to swallow them.
In fail2ban we have a derived TestCase class for that purpose: https://github.com/fail2ban/fail2ban/blob/HEAD/fail2ban/tests/utils.py#L200 which also allows to analyze those logs for testing (which we would need to). But here we better come up with a decorator which captures and exposes within the test the logs. Any takers? ;)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.