Giter Club home page Giter Club logo

Comments (25)

taf2 avatar taf2 commented on August 15, 2024

Please try again with 0.6.6, setting the new attribute Curl::Multi.default_timeout = 1

from curb.

igrigorik avatar igrigorik commented on August 15, 2024

Curl version: * $Id: curl.h,v 1.379 2009-03-02 23:05:31 bagder Exp $

Default timeout appears to be set at 100ms. Applied the patch, set timeout to 1ms, running now - will take a bit of time to get a better picture. Just eyeballing the last minute of stats though, it does seem that it's going faster, but still doesn't match the 0.4.6.x branches.

Will try reverting back a few commits to see where the problem is introduced:

http://github.com/taf2/curb/commit/1be75e3dee822e5e760d21468d1d7b3b0472fcf4  -> use an internal Hash instead of static variables to store ruby objects... the thought here was it should speed up Curl::Easy.new -> yet to be proven though..
http://github.com/taf2/curb/commit/4bdc81576ae7483f130a4eec17c73373927614fa -> timeout refactored

Other commits that may have impacted performance:
http://github.com/taf2/curb/commit/35cb65b03f7b386fe6c7f7068b7ac7622c6e1820
http://github.com/taf2/curb/commit/6af516cb31909610f4e6b9f4302aeb49bfaeed49

This one maybe if you're not running the loop through a multiple handle and not running multiple threads... http://github.com/taf2/curb/commit/d0944bca9a573fa8bbe2f7691d421516738a25bb

from curb.

igrigorik avatar igrigorik commented on August 15, 2024

Nope, must be something else - throughput stayed the same after applying the timeout patch.

from curb.

igrigorik avatar igrigorik commented on August 15, 2024

Ok, ran bisect on it, and it's definitely the "cleanup timeout commit":

4bdc81576ae7483f130a4eec17c73373927614fa is first bad commit
commit 4bdc81576ae7483f130a4eec17c73373927614fa
Author: Todd Fisher 
Date:   Sun Jul 26 16:31:52 2009 -0400

    cleanup timeout code in multi select loop

:040000 040000 449d377da6cad320dca006253e2318ffb5e792e7 a7b14dc5524c6dfb6577df3214aabbb97ce69665 M  ext

bisect run:

git bisect start
# bad: [ac0b465392429caaaec7ca2eadf98fe2c05603cb] ready for release
git bisect bad ac0b465392429caaaec7ca2eadf98fe2c05603cb
# good: [6c133998302ea9ad0c5146880cbd7fa8585edf6e] push out new gem release
git bisect good 6c133998302ea9ad0c5146880cbd7fa8585edf6e
# bad: [1745d46fcf1e5c55276be558b64ed0644ed612a2] compat with non GNU strip binaries
git bisect bad 1745d46fcf1e5c55276be558b64ed0644ed612a2
# bad: [a46607c037576bcca3c782ee32159f9dd46f5e10] update other request types to yield once
git bisect bad a46607c037576bcca3c782ee32159f9dd46f5e10
# bad: [4bdc81576ae7483f130a4eec17c73373927614fa] cleanup timeout code in multi select loop
git bisect bad 4bdc81576ae7483f130a4eec17c73373927614fa
# good: [c351e53e30eff1eae119fbd79d54b37da64930cc] make sure to set CURLOPT_NOBODY as reported by  copiousfreetime
git bisect good c351e53e30eff1eae119fbd79d54b37da64930cc
# good: [ab433db38df1e86cfc0261796a96fe1cb3cde209] add another test case for 500 responses
git bisect good ab433db38df1e86cfc0261796a96fe1cb3cde209
# good: [4233fcc6f65f1107693014038da9320cbdee5c5e] set upload object to Qnil on cleanup
git bisect good 4233fcc6f65f1107693014038da9320cbdee5c5e
# good: [50bdc2beed1c89c903145c05cc00344b9a7ac9da] add inspect method
git bisect good 50bdc2beed1c89c903145c05cc00344b9a7ac9da

Latest patches did not fix (tried setting default_timeout to 1ms, same results).

from curb.

taf2 avatar taf2 commented on August 15, 2024

I know it's been a while, but wondering if you can try with this last commit. http://github.com/taf2/curb/commit/f001b0ead7d7df1f2289087e8527d4d1183a4b6d

from curb.

igrigorik avatar igrigorik commented on August 15, 2024

Hmm, nope: http://skitch.com/igrigorik/dnxc4/ganglia-comments-keystone-i-140d0e7c-host-report

:-(

from curb.

ramsingla avatar ramsingla commented on August 15, 2024

taf2 I have made some changes to my fork for the Curl::Multi
These changes have made the running of tests significantly faster. Almost 2-3 seconds to run all the units tests, which used to take 8-9 seconds.

You can have a look at this patch.

http://github.com/ramsingla/curb/commit/d56853bd435c71f4a68a1557bf57d3a1b96a5698

There are couple of things regarding the timeout I had change that could be kept the same. I am sending you the pull request.

from curb.

taf2 avatar taf2 commented on August 15, 2024

Thanks! I'll review tonight

from curb.

ramsingla avatar ramsingla commented on August 15, 2024

Also this patch provides the issue with adding more requests to the same multi handler on_complete callbacks dynamically. Often it was observed that the requests where not performed. This is important in the application when some serialization is required amongst some requests. In these cases, the perform call had to be called again for the requests which imho is not desired behavior.

from curb.

taf2 avatar taf2 commented on August 15, 2024

Okay, after applying the patch my tests went from taking 4.2 seconds to 5.2 seconds... not sure test speed in this case is a good measure... but what I do see is we're not relying on libcurl to tell us how long to wait... that does make the loop much simpler. I like the additional do while loop to make the calling ruby code simpler. I imagine that even remains backwards compatible with the previous behavior... I made one minor formatting change, will spin a new release with this code. Maybe igrigorik could run his tests again to see if this improves performance?

from curb.

ramsingla avatar ramsingla commented on August 15, 2024

Yes you are right. I was experimenting first with the timeout. But the real change was the additional do while loop and making the rb_curl_multi_run not to read on every all. But regarding the performance, it might be better if the libcurl timeout can be introduced back. On Max OS X i saw 3 to 4 fold improvement but it might be because curl on my system did not supported the libcurl timeout.

from curb.

igrigorik avatar igrigorik commented on August 15, 2024

Hmm, nope :-(

http://skitch.com/igrigorik/de9m5/photoshop

from curb.

ramsingla avatar ramsingla commented on August 15, 2024

@igrigorik: is there any improvement? from the charts it seems the cpu usage has gone low and throughtput is slightly better than before. are you using libcurl version less than 7.19.x?
@taf2: libcurl timeout could make the difference. was it there in 0.4.6.0?

from curb.

igrigorik avatar igrigorik commented on August 15, 2024
curl 7.19.4 (x86_64-unknown-linux-gnu) libcurl/7.19.4 OpenSSL/0.9.8k zlib/1.2.3 c-ares/1.6.0
Protocols: tftp ftp telnet dict http file https ftps 
Features: AsynchDNS Largefile NTLM SSL libz

The new gem is the valley.. the download speed drops to about half.

from curb.

taf2 avatar taf2 commented on August 15, 2024

igrigorik, I created this script to identify some perf issues: http://gist.github.com/446279 On my mac, I identified that openssl is causing creation of Curl::Easy handles to take a very long time, the offending call is in curl_easy_init - running that script will identify the issue... I don't think you have that issue - but maybe see?

from curb.

taf2 avatar taf2 commented on August 15, 2024

Okay, still sort of shooting in the dark, but this commit has had a slight improvement in test run time for me... http://github.com/taf2/curb/commit/620ca1669ad41c3ad9e27434788e1c84740655f7
shaved a second off...

from curb.

taf2 avatar taf2 commented on August 15, 2024

I added some benchmarks based on the persistent connection tests brought to attention by drbrain - i should add em-http to that as well?

from curb.

ramsingla avatar ramsingla commented on August 15, 2024

interesting thing about the chart from the new version is that the CPU utilization decreases. I will have close have close look at 0.4.6 code. My only guess is that Curl::Easy instance in newer code uses Curl::Multi Could that make the difference?
@igrigorik are you using Curl::Multi or Curl::Easy?

from curb.

ramsingla avatar ramsingla commented on August 15, 2024

@taf2: please look at the difference of multi perfom function in both versions
http://gist.github.com/446415
imho, igrigorik you might be adding more to curl multi using a block with perform call. block call is called in every iteration in the old version and in newer version only if the rb_thread_select returns 0. so it might be that more easy handles are being added on the fly to same multi handler by igrigorik's script and that i can relate with the CPU usage difference. Can this be the source of performance slowdown?

from curb.

igrigorik avatar igrigorik commented on August 15, 2024

Hey guys, sorry didn't get a chance to test this yet.. but I like your last theory, makes sense, conceptually.

from curb.

ramsingla avatar ramsingla commented on August 15, 2024

igrigorik, if this makes sense conceptually then it might be that you need to alter the implementation of the code somehow to make it better with newer version, which is more efficient ( Ideally the block call should be made only if the select returns 0 ). how many urls are you adding to Curl::Multi in a go? Can you double the number and see if the performance with the newer version is better or worse? Also could you add new urls with on_complete callbacks?

from curb.

taf2 avatar taf2 commented on August 15, 2024

Let's try again with this change maybe? I remember removing the yield block believing it made sense to focus the loop more on immediate network activity - but rb_thread_select in favor of code in the perform block might be the wrong decision... if there is code in the perform block it should probably run in favor of other ruby threads... Here's the commit: http://github.com/taf2/curb/commit/ac07429c7a043f43b2e058a4b5be9f9559eaac42

from curb.

igrigorik avatar igrigorik commented on August 15, 2024

Awesome. Ram, your theory is right on the money. With the latest patch the perf is definitely back into the same +-/10% range. I also converted the code to queue up the next easy handle in the on_success handler instead of the perform, and that did seem to help a bit as well. Everything combined, we're back on the same throughput numbers. Incidentally, the CPU usage has remained the same, no changes.

I think we can finally close this guy! Although.. I do wonder how the docs can/should be updated to reflect this learning, as I'm sure others may run into the same issue (even worse, without even knowing it).

from curb.

taf2 avatar taf2 commented on August 15, 2024

Alright, I updated the README files. Closing this one out. I'll work on a new release soon.

from curb.

ramsingla avatar ramsingla commented on August 15, 2024

cheers, finally we got this thing right.
igrigorik if only on_success callback is use to add more to the queue then it should be low on cpu. ( when one job finishes add another ). Otherwise, with the latest patch you can yield aggressively by providing block to the perform call ( i think this should be high on cpu ).

from curb.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.