Comments (5)
Hey,
I am at the moment trying to download the eNKI-dataset from https://coins.trendscenter.org/
I can log in on that page, go there on "download" and then I get either a list of ~500 zips, that I can click on and my browser wants to download that.
At the end of the list of zips, I can get also a list of ~500 urls, that lead to the download.
Example: "https://dx-download.trendscenter.org/dataDownloadCenter/?c=secret&[email protected]"
So my plan was, to iterate over this list of urls (that I copied into a textfile) with datalad download-url, which works for ~100 of the 500 files.
For the rest I get a set of different errors:
-
download_url(error): /data/project/rehab_eNKI/eNKI_data/ (file) [Downloaded size 681181184 differs from originally announced 5262639843 [base.py:_verify_download:351]]
-
download_url(error): /data/project/rehab_eNKI/eNKI_data/ (file) [Failed to establish a new session 1 times. Last exception was: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')) [adapters.py:send:498] [http.py:get_downloader_session:506]]
-
download_url(error): /data/project/rehab_eNKI/eNKI_data/ (file) [Access to https://dx-download.trendscenter.org/dataDownloadCenter/?c=secret&[email protected] has failed: status code 502 [http.py:check_response_status:116]]
-
download_url(error): /data/project/rehab_eNKI/eNKI_data/ (file) [Access to https://dx-download.trendscenter.org/dataDownloadCenter/?c=secret&[email protected] has failed: status code 504 [http.py:check_response_status:116]]
Additional Information:
My first thought was, that the authentification might be "broken" for the urls after some time. But if I delete a file, that I could download, I can download it again, so the links are not expired.
I checked a few archives that I could download, they contain for example niftis, that I could open "normally". But I am not aware of a possibility to check, if the data might be not complete or even has errors in it.
from datalad.
So my plan was, to iterate over this list of urls (that I copied into a textfile) with datalad download-url, which works for ~100 of the 500 files.
why not to try using datalad addurls
?
502 == bad gateway, 504 == gateway timeout... Can you download those particular ones nicely using browser? what would be the size? any errors reported in the console? is initial size reported matches what ends up being downloaded? I could hypothesize on what possibly could go wrong, but better to get more "information" first.
from datalad.
I choosed "datalad download_url" because it worked for downloading the first archive via error, so I ran that in parallel for all links. I will also try datalad addurls.
After downloading and comparing the sizes for two archives with du :
8247640 via browser, but
8243934 via datalad download-url
6036660 via browser
5856524 with datalad download-url
``
datalad download-url
download_url(ok): /data/project/rehab_eNKI/eNKI_data/scan_data003.zip (file)
add(ok): scan_data003.zip (file)
save(ok): . (dataset)
action summary:
add (ok: 1)
download_url (ok: 1)
save (ok: 1)
``
The console does not report errors.
unzip scan_data003.zip
works without error messages, but these archives are probably all not complete.
Restarting the download for all missing archives gives ~30 archives that before gave error messages.
The logs for the ones that are still missing contain the errors from above.
from datalad.
so it feels like we have a bug! :-( Let's continue in #4042 and please cut/paste there output from datalad wtf -D html_details
. I will try to get to it next week unless someone beats me to it. Troubleshooting access to a closed-source (although loudly stated/promised to be open) platform is not on my priority list.
from datalad.
I will close this for-our-information
-style issue now, due to its persistent inactive state. This does not mean that support for this will not be added. It just reflects the evident insufficient capacity to work on this any time soon.
from datalad.
Related Issues (20)
- pytest collection fails on recentish neurodebians: Argument(s) {'collection_path'} are declared in the hookimpl but can not be found in the hookspec HOT 3
- datalad siblings enable fails in git-cloned dataset without git-annex branch HOT 1
- parallel get from datalad archive gives error
- Brainstorming: path to DataLad v2? HOT 1
- Install datalad by easybuild HOT 1
- datalad update fails randomly with error: "cannot lock ref 'refs/remotes/origin/master'" and ".... git-annex" HOT 1
- Github tarball checksums changed HOT 2
- Different HPC systems and users HOT 2
- Add ability to limit get (and thus install) --recursive installation of subdatasets
- Edge case: Large datalad saves with tight ulimits on many-core machines can fail
- 1-letter shortcut for `--reobtain-data` in datalad-update HOT 1
- `str(GitTransportRI)` broken, and with it `_get_flexible_source_candidates()`
- Boto dependency HOT 1
- Extension command line argument in conflict with `datalad` level argument HOT 3
- "Convert" .travis.yml into a github workflow
- DataLad extensions are not properly registered on Python 3.12 HOT 1
- FOI: "generic" analog to WTF?
- Datalad get can't find URL despite registering via addurls (and I can see the URL with git annex whereis) HOT 21
- `create_sibling_ria` does not release `IO` handler resources properly
- MacOS tests fail to install Python 3.7 (which is EOL anyway) HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from datalad.