Giter Club home page Giter Club logo

Comments (5)

TobiasKadelka avatar TobiasKadelka commented on May 31, 2024

Hey,
I am at the moment trying to download the eNKI-dataset from https://coins.trendscenter.org/
I can log in on that page, go there on "download" and then I get either a list of ~500 zips, that I can click on and my browser wants to download that.

At the end of the list of zips, I can get also a list of ~500 urls, that lead to the download.
Example: "https://dx-download.trendscenter.org/dataDownloadCenter/?c=secret&[email protected]"
So my plan was, to iterate over this list of urls (that I copied into a textfile) with datalad download-url, which works for ~100 of the 500 files.

For the rest I get a set of different errors:

  • download_url(error): /data/project/rehab_eNKI/eNKI_data/ (file) [Downloaded size 681181184 differs from originally announced 5262639843 [base.py:_verify_download:351]]

  • download_url(error): /data/project/rehab_eNKI/eNKI_data/ (file) [Failed to establish a new session 1 times. Last exception was: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')) [adapters.py:send:498] [http.py:get_downloader_session:506]]

  • download_url(error): /data/project/rehab_eNKI/eNKI_data/ (file) [Access to https://dx-download.trendscenter.org/dataDownloadCenter/?c=secret&[email protected] has failed: status code 502 [http.py:check_response_status:116]]

  • download_url(error): /data/project/rehab_eNKI/eNKI_data/ (file) [Access to https://dx-download.trendscenter.org/dataDownloadCenter/?c=secret&[email protected] has failed: status code 504 [http.py:check_response_status:116]]

Additional Information:
My first thought was, that the authentification might be "broken" for the urls after some time. But if I delete a file, that I could download, I can download it again, so the links are not expired.

I checked a few archives that I could download, they contain for example niftis, that I could open "normally". But I am not aware of a possibility to check, if the data might be not complete or even has errors in it.

@mih

from datalad.

yarikoptic avatar yarikoptic commented on May 31, 2024

So my plan was, to iterate over this list of urls (that I copied into a textfile) with datalad download-url, which works for ~100 of the 500 files.

why not to try using datalad addurls?

502 == bad gateway, 504 == gateway timeout... Can you download those particular ones nicely using browser? what would be the size? any errors reported in the console? is initial size reported matches what ends up being downloaded? I could hypothesize on what possibly could go wrong, but better to get more "information" first.

from datalad.

TobiasKadelka avatar TobiasKadelka commented on May 31, 2024

I choosed "datalad download_url" because it worked for downloading the first archive via error, so I ran that in parallel for all links. I will also try datalad addurls.

After downloading and comparing the sizes for two archives with du :
8247640 via browser, but
8243934 via datalad download-url

6036660 via browser
5856524 with datalad download-url

``

datalad download-url
download_url(ok): /data/project/rehab_eNKI/eNKI_data/scan_data003.zip (file)
add(ok): scan_data003.zip (file)
save(ok): . (dataset)
action summary:
add (ok: 1)
download_url (ok: 1)
save (ok: 1)
``

The console does not report errors.

unzip scan_data003.zip

works without error messages, but these archives are probably all not complete.

Restarting the download for all missing archives gives ~30 archives that before gave error messages.
The logs for the ones that are still missing contain the errors from above.

from datalad.

yarikoptic avatar yarikoptic commented on May 31, 2024

so it feels like we have a bug! :-( Let's continue in #4042 and please cut/paste there output from datalad wtf -D html_details. I will try to get to it next week unless someone beats me to it. Troubleshooting access to a closed-source (although loudly stated/promised to be open) platform is not on my priority list.

from datalad.

mih avatar mih commented on May 31, 2024

I will close this for-our-information-style issue now, due to its persistent inactive state. This does not mean that support for this will not be added. It just reflects the evident insufficient capacity to work on this any time soon.

from datalad.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.