Comments (3)
Well, git-annex doesn't have S3 functionality inside. It uses a library.
I think that using a library is a good solution.
I mean, look at git-annex's code to retrieve a file from S3:
retrieve :: S3Handle -> Retriever
retrieve h = fileRetriever $ \f k p -> liftIO $ runResourceT $ do
(fr, fh) <- allocate (openFile f WriteMode) hClose
let req = S3.getObject (bucket info) (bucketObject info k)
S3.GetObjectResponse { S3.gorResponse = rsp } <- sendS3Handle' h req
responseBody rsp $$+- sinkprogressfile fh p zeroBytesProcessed
release fr
where
info = hinfo h
sinkprogressfile fh meterupdate sofar = do
mbs <- await
case mbs of
Nothing -> return ()
Just bs -> do
let sofar' = addBytesProcessed sofar (S.length bs)
liftIO $ do
void $ meterupdate sofar'
S.hPut fh bs
sinkprogressfile fh meterupdate sofar'
Only 2 lines of that have anything to do with S3. The entire rest of it
is to do with streaming the bytes out to a file, with progress meter update.
And all of it is written to the abstraction level git-annex needs, which
is "retrieve this Key with progress sent to this MeterUpdate, and
provide a ContentSource to the passed callback action to consume it",
which is not the abstraction level a general-purpose library would need.
(Nor is it a stable abstraction, all this has been massively reworked once in
the last year, and may well be again. I want flexability of implementation
internals so I can make them better; exposing a library API is counter to
that.)
see shy jo
from datalad.
"It uses a library. I think that using a library is a good solution"
yes -- it is a great solution to use a library whenever it is available ;)
What I meant is that to provide a custom downloader from S3 we will also need to use some library. But by 'using' a library git-annex already has all the needed functionality built-in to provide e.g. fetches from S3. Moreover you have here already a "proper" progress meter update for git-annex. Wouldn't it be neat then (this is all just a food for thought, feel free just to say No and close) if from within my custom special remote I could rely on git-annex'es built-in functionality (thus with "standard" progress metter) to fetch from S3 instead of relying on 1 more library (e.g. s3cmd) on my end?
from datalad.
since interactions with S3 could be quite tricky anyways, we will just handle them on our side through boto
from datalad.
Related Issues (20)
- pytest collection fails on recentish neurodebians: Argument(s) {'collection_path'} are declared in the hookimpl but can not be found in the hookspec HOT 3
- datalad siblings enable fails in git-cloned dataset without git-annex branch HOT 1
- parallel get from datalad archive gives error
- Brainstorming: path to DataLad v2? HOT 1
- Install datalad by easybuild HOT 1
- datalad update fails randomly with error: "cannot lock ref 'refs/remotes/origin/master'" and ".... git-annex" HOT 1
- Github tarball checksums changed HOT 2
- Different HPC systems and users HOT 2
- Add ability to limit get (and thus install) --recursive installation of subdatasets
- Edge case: Large datalad saves with tight ulimits on many-core machines can fail
- 1-letter shortcut for `--reobtain-data` in datalad-update HOT 1
- `str(GitTransportRI)` broken, and with it `_get_flexible_source_candidates()`
- Boto dependency HOT 1
- Extension command line argument in conflict with `datalad` level argument HOT 3
- "Convert" .travis.yml into a github workflow
- DataLad extensions are not properly registered on Python 3.12 HOT 1
- FOI: "generic" analog to WTF?
- Datalad get can't find URL despite registering via addurls (and I can see the URL with git annex whereis) HOT 21
- `create_sibling_ria` does not release `IO` handler resources properly
- MacOS tests fail to install Python 3.7 (which is EOL anyway) HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from datalad.