Giter Club home page Giter Club logo

Comments (3)

joeyh avatar joeyh commented on June 9, 2024

Well, git-annex doesn't have S3 functionality inside. It uses a library.
I think that using a library is a good solution.

I mean, look at git-annex's code to retrieve a file from S3:

retrieve :: S3Handle -> Retriever
retrieve h = fileRetriever $ \f k p -> liftIO $ runResourceT $ do
(fr, fh) <- allocate (openFile f WriteMode) hClose
let req = S3.getObject (bucket info) (bucketObject info k)
S3.GetObjectResponse { S3.gorResponse = rsp } <- sendS3Handle' h req
responseBody rsp $$+- sinkprogressfile fh p zeroBytesProcessed
release fr
where
info = hinfo h
sinkprogressfile fh meterupdate sofar = do
mbs <- await
case mbs of
Nothing -> return ()
Just bs -> do
let sofar' = addBytesProcessed sofar (S.length bs)
liftIO $ do
void $ meterupdate sofar'
S.hPut fh bs
sinkprogressfile fh meterupdate sofar'

Only 2 lines of that have anything to do with S3. The entire rest of it
is to do with streaming the bytes out to a file, with progress meter update.

And all of it is written to the abstraction level git-annex needs, which
is "retrieve this Key with progress sent to this MeterUpdate, and
provide a ContentSource to the passed callback action to consume it",
which is not the abstraction level a general-purpose library would need.

(Nor is it a stable abstraction, all this has been massively reworked once in
the last year, and may well be again. I want flexability of implementation
internals so I can make them better; exposing a library API is counter to
that.)

see shy jo

from datalad.

yarikoptic avatar yarikoptic commented on June 9, 2024

"It uses a library. I think that using a library is a good solution"
yes -- it is a great solution to use a library whenever it is available ;)
What I meant is that to provide a custom downloader from S3 we will also need to use some library. But by 'using' a library git-annex already has all the needed functionality built-in to provide e.g. fetches from S3. Moreover you have here already a "proper" progress meter update for git-annex. Wouldn't it be neat then (this is all just a food for thought, feel free just to say No and close) if from within my custom special remote I could rely on git-annex'es built-in functionality (thus with "standard" progress metter) to fetch from S3 instead of relying on 1 more library (e.g. s3cmd) on my end?

from datalad.

yarikoptic avatar yarikoptic commented on June 9, 2024

since interactions with S3 could be quite tricky anyways, we will just handle them on our side through boto

from datalad.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.