Giter Club home page Giter Club logo

Comments (5)

Evizero avatar Evizero commented on August 23, 2024 1

The way I do this currently in MLDatasets is that after DataDeps does its thing (i.e. check that the folder exists) I check if the requested file exists in that folder. If it doesn't, the code assumes that the file should be present but must have been deleted. Consequently it simply retriggers DataDeps.download and then checks again. (see https://github.com/JuliaML/MLDatasets.jl/blob/0fb774033d5c5ac9be4be41ee111209339dfa188/src/io/download.jl#L31-L41)

In other words I also don't assume that the requested file is in the specified list of to-download files (since as you say we don't know what the post-fetch step does). But I think the above is a fair enough assumption.

We could allow this mechanism as part of DataDeps using some syntax using /. For example, CIFAR10 only downloads a single archive "cifar-10-binary.tar.gz", but after post_fetch i end up with a subfolder and a couple files in them.

For this, datadep"CIFAR10/cifar-10-batches-bin/test_batch.bin" could mean DataDep "CIFAR10" and then subfolder "cifar-10-batches-bin" and then file "test_batch.bin". This could tell DataDeps that if the folder can't be found, or if the specified subfolder/file doesn't exists, trigger fetch and post-fetch for "CIFAR10" and then try again. The macro should in the end return the path to the actual file (e.g. "/home/user/.julia/datadeps/CIFAR10/cifar-10-batches-bin/test_batch.bin")

from datadeps.jl.

Evizero avatar Evizero commented on August 23, 2024 1

A nice side-effect of this is that the existence of the downloaded archive file is never checked. As a consequence a user could just have the dataset predownloaded and extracted without keeping the archive file around

from datadeps.jl.

oxinabox avatar oxinabox commented on August 23, 2024

Which files in particular?

Do you mean in-between the fetch step, and the post-fetch step?
Those files we could indeed check.
If a checksum is provided we kinda do check them, don't we?
I guess the default fallback function (which prints the xor'd hash for everything) might not though.

We can't check the files after the post-fetch step,
since we don't know what the post-fetch step will do.
(E.g. extract, or delete, or synthesize)
Related to that is #6

from datadeps.jl.

Evizero avatar Evizero commented on August 23, 2024

We can't check the files after the post-fetch step, since we don't know what the post-fetch step will do.

That is a good point. I'll think on this a little.

from datadeps.jl.

oxinabox avatar oxinabox commented on August 23, 2024

that seems reasonable.

from datadeps.jl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.