Giter Club home page Giter Club logo

Comments (11)

krobertson avatar krobertson commented on June 25, 2024 1

I have some code for doing parallel uploads to S3 in an internal repo and will see if I can port it into deb-s3.

from deb-s3.

htmldoug avatar htmldoug commented on June 25, 2024 1

Thanks for the response! If you're talking about parallel uploads within deb-s3 that'd be cool, although not quite what I meant by this issue. The use-case I'm hitting is running multiple instances of deb-s3 --bucket foo --codename bar simultaneously on many boxes.

Parallelizing the S3 transfers would help to a point, but would put more work into the single-track than the solution I'm experimenting with. As long as the parallel-safe /pool transfers are lumped in with the unsafe /dists alterations, I'm not sure it could be as fast as splitting them apart. Totally understand if that's more complexity as you want to incorporate though. I don't mind maintaining our own fork.

Regarding parallel uploads, I can say that I saw a 90% speedup after hooking up https://github.com/grosser/parallel to verify with 20 threads. According to this article, S3 bandwidth should scale up with upload nicely as well, so that'd be a nice move too.

from deb-s3.

krobertson avatar krobertson commented on June 25, 2024 1

Simultaneous execution from multiple hosts is completely different and very easy to do wrong. It is a hard problem to actually do it write, because multiple hosts will be writing to the same location on a remote API. You'd be looking at distributed locking/versioning and ensuring two don't make the call at the same time is hard. I don't plan on trying to enable that because it wasn't a primary use case. You can whip it up for controlled usage, but it is hard to actually get it right. Even with the linked commit with the path, it requires more work to do it across many physical systems vs many processes on a single system.

from deb-s3.

htmldoug avatar htmldoug commented on June 25, 2024

Oh yeah, totally agree. Didn't even occur to me to have deb-s3 do locking. That'd be crazy hard with S3's eventual consistency.

I just split the parts of deb-s3 that needed to be locked, from the parts that didn't need to be locked and was happy to let Jenkins manage the mutexing since it does that very easily.

from deb-s3.

guilhem avatar guilhem commented on June 25, 2024

@htmldoug #59 can helping you to implement this.

I have a function upload_package_s3 with a each.
I don't know how to parallel in ruby, but may be easy to do.

from deb-s3.

krobertson avatar krobertson commented on June 25, 2024

If implemented, I think it would make sense through a plugin/extension, so maybe adding a way to load a ~/.deb-s3.rb file or something along those lines. Then you could inject something that used like redis or memcache to act as a distributed lock.

from deb-s3.

guilhem avatar guilhem commented on June 25, 2024

@krobertson
There is no need to do this with my work.

As I act on manifest / release / packages separately, When I upload packages, I do this in only 1 same part.
So this part can be parallel with option.

from deb-s3.

htmldoug avatar htmldoug commented on June 25, 2024

@krobertson Adding any kind of locking to deb-s3 seems like a bad idea. My unsolicited advice is that it's better to let this be done externally so users don't ask "does a plugin exist to support my external locking system." I'd have deb-s3 do one thing and do it well.

To that end, all that's needed to support external locking strategies would be to allow deb-s3 to also do "half a thing" and do it well. Namely upload to /pool in one run, and upload to /dists in a second.

My fork has been doing exactly that on our prod application build for a week and it's gone from one 10-minute deb-s3 call to a 1 minute 30 second (multiple-host-safe) /pool upload and a 1 minute (globally synchronized) /dist upload.

For us, that's the difference between one big deb-s3 upload...
before

... and split up deb-s3 upload *pool only* and deb-s3 upload *dist only* calls...
after

I'm not sure how to explain any clearer.

Are you interested in incorporating an approach like this? If so, I'd be happy to clean up my code and contribute a PR for your review. Otherwise, I'll stop trying to explain and leave you to whatever approach you think is best.

Either way, I'd be happy to buy you a beer next time I'm in SF or you're in DC for contributing this to the open source community.

@guilhem Check out parallel. Very simple and great docs. Here's how we're using it. Seeing huge gains.

Cheers!

from deb-s3.

krobertson avatar krobertson commented on June 25, 2024

I'll take another look at it this weekend... my plugin was suggestion was mostly around loading a ruby file and letting some monkey patch, if they really wanted to. But agree is not necessarily ideal.

from deb-s3.

alexmreis avatar alexmreis commented on June 25, 2024

I've added locking support which should make the multi-host upload more safe. It doesn't do anything in parallel (ruby is really bad in doing things in parallel anyway unless you use something like active_job or resque) See pull request #89

from deb-s3.

alexmreis avatar alexmreis commented on June 25, 2024

BTW eventual consistency is not a problem unless you use the US Standard region. See http://shlomoswidler.com/2009/12/read-after-write-consistency-in-amazon.html

from deb-s3.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.