igrigorik / em-http-request Goto Github PK
View Code? Open in Web Editor NEWAsynchronous HTTP Client (EventMachine + Ruby)
Asynchronous HTTP Client (EventMachine + Ruby)
I have 0.2.7 installed with ruby version 1.8.5 and it's breaking on lib/em-http/client.rb line 307 because this version of ruby doesn't have the bytesize method on String.
I know this is an old ruby version but could anything be done like checking respond_to? or making the gem require a high enough ruby version?
http://github.com/igrigorik/em-http-request/blob/master/lib/em-http/request.rb#L88
Any reason why it's not self? This prevents subclassing HttpRequest without overloading that entire method.
I just started running across this when using ruby 1.9.1-p378 and 1.9.2-rc2.
I am getting the error message from bundler.
I am including the gem like this:
gem 'em-http-request',:git => 'git://github.com/igrigorik/em-http-request.git', :require => 'em-http'
This is the message bundler shows me:
em-http-request at /Users/sean/.rvm/gems/ruby-1.9.1-p378/bundler/gems/em-http-request-b708f594a8d9f1469f1eda21ad58d9718d89a717-master did not have a valid gemspec. This prevents bundler from installing bins or native extensions, but that may not affect its functionality. If you need to use this package without installing it from a gem repository, please contact [email protected] and ask them to modify their .gemspec so it can work with `gem build`. The validation message from Rubygems was: ["README.rdoc"] are not files
HTTPS get requests using em-http-request built from head crash ruby on windows.
C:\Ruby186\bin>gem install igrigorik-em-http-request --source http://gems.github
.com/
Building native extensions. This could take a while...
ERROR: Error installing igrigorik-em-http-request:
ERROR: Failed to build gem native extension.
C:/Ruby186/bin/ruby.exe extconf.rb
checking for rb_thread_blocking_region()... no
checking for rb_str_set_len()... no
checking for sys/select.h... no
checking for poll.h... no
checking for sys/epoll.h... no
checking for sys/event.h... no
checking for port.h... no
checking for openssl/ssl.h... no
checking for sysctlbyname() in sys/param.h,sys/sysctl.h... no
creating Makefile
nmake
PKSFX (R) FAST! Self Extract Utility Version 2.04g 02-01-93
Copr. 1989-1993 PKWARE Inc. All Rights Reserved. Shareware version
PKSFX Reg. U.S. Pat. and Tm. Off.
Searching EXE: C:/WINDOWS/SYSTEM32/NMAKE.EXE
Inflating: NMAKE.ERR
Inflating: NMAKE.EXE
Inflating: README.TXT
nmake install
Microsoft (R) Program Maintenance Utility Version 1.50
Copyright (c) Microsoft Corp 1988-94. All rights reserved.
C:\Ruby186\bin\ruby -e "puts 'EXPORTS', 'Init_em_buffer'" > em_buffer-i
386-mswin32.def
cl -nologo -I. -I. -IC:/Ruby186/lib/ruby/1.8/i386-mswin32 -I. -MD -Zi -O
2b2xg- -G6 -DRUBY_VERSION_CODE=186 -c -Tcem_buffer.c
cl : Command line warning D9035 : option 'Og-' has been deprecated and will be r
emoved in a future release
cl : Command line warning D9002 : ignoring unknown option '-G6'
em_buffer.c
em_buffer.c : fatal error C1902: Program database manager mismatch; please check
your installation
NMAKE : fatal error U1077: 'C:\Windows\system32\cmd.exe' : return code '0x2'
Stop.
Hello Ilya,
I hope you're well... I just found a little bug in the way redirections are handled and followed when there is an options[:host] that is specified. The connection will try to be re-established against the same server, while you want to resolve the domain again.
The fix is easy : upon redirect, remove the options[:host].
I can't wait for EventMachine to actually support async DNS resolution. I'm trying to bribe Aman, but he isn't that much responsive. Feel free to push him too :)
Cheers!
Hey Ilya,
Any plan to support auto-redirect upon 3XX? That would be neat ;)
Julien
Hey,
I'm not sure why Addressable/uri is better than the default uri lib, but in our case it consumes a LOT more memory... (twice as much as ActiveSupport for example), so I wanted to make sure you're aware of it and if there was no way of reverting back to the default lib.
As far as I can tell, the errback isn't being fired when a WebSocket connection dies. Is that intended behavior?
The server:
require 'em-websocket'
EventMachine::WebSocket.start(:host => "0.0.0.0", :port => 8080) do |ws|
ws.onopen { ws.send "Hello Client!"}
ws.onmessage { |msg| p "got: #{msg}"; ws.send "Pong: #{msg}" }
ws.onclose { puts "WebSocket closed" }
end
The Client:
require 'eventmachine'
require 'em-http'
module KBHandler
include EM::Protocols::LineText2
def receive_line(data)
p "Want to send: #{data}"
p "Error status: #{$http.error?}"
$http.send(data)
p "After send"
end
end
EventMachine.run {
$http = EventMachine::HttpRequest.new("ws://localhost:8080/").get :timeout => 0
$http.errback { p 'oops' }
$http.callback {
puts "WebSocket connected!"
}
$http.stream { |msg|
puts "Recieved: #{msg}"
}
EM.open_keyboard(KBHandler)
}
To reproduce:
Hi,
I have tried to fetch the feed at http://news.ycombinator.com/rss unfortunately, it seems that em-http-request has some problems with this one. To my understanding, the problems are related to HttpClientParser which cannot parse the headers correctly.
Can you help?
Julien
http://parcbagatelle.com redirects to http://www.parcbagatelle.com:
$ curl -I http://parcbagatelle.com
HTTP/1.1 301 Moved Permanently
Date: Fri, 06 Aug 2010 14:47:27 GMT
Server: Apache
Location: http://www.parcbagatelle.com/
Vary: Accept-Encoding
Connection: close
Content-Type: text/html; charset=iso-8859-1
but when trying this with em-http and the redirect option:
require "em-http-request"
EventMachine.run {
http = EventMachine::HttpRequest.new("http://parcbagatelle.com").get(:timeout => 10, :redirects => 10)
http.callback {
puts http.response_header.status
puts http.last_effective_url
puts http.response_header.inspect
puts "******"
puts http.response.inspect
EventMachine.stop
}
http.errback {
puts "FAIL!"
}
}
The header seems to be right, but the body still belongs to the initial request before the redirect to www.[...]
Something like Net::HTTP::Proxy(proxy_url).start(get_url) {|http| } would be nice to see.
/usr/local/lib/ruby/gems/1.9.1/gems/em-http-request-0.2.7/lib/em-http.rb:13:in `require': /usr/local/lib/ruby/gems/1.9.1/gems/em-http-request-0.2.7/lib/em-http/client.rb:607: invalid multibyte escape: /\000([^\377]*)\377/ (SyntaxError)
invalid multibyte escape: /^\x00|\xff$/
$ ruby --version
ruby 1.9.1p376 (2009-12-07 revision 26041) [i386-darwin10.2.0]
Is the DNS resolution async?
Looks like this (in em-http/client.rb line 307):
head['content-length'] = body.length if body
Should be:
head['content-length'] = body.bytesize if body
on Ruby 1.9. I get a massive drop in the number of XML parsing errors with it changed.
The :timeout option uses comm_inactivity_timeout which is really only effective for already-established connections (and also I believe, immediately failed connections, on an internal level.) Tonight I added in EM the ability to set 'PendingConnectTimeout' which is really the correct option. When you try to connect to somethign that doesn't immediately fail due to connection refused etc, it will hang beyond the timeout to 50 sec. Can you please include some support for this? Thanks
http://github.com/eventmachine/eventmachine/commit/beab961a5f7b4e546112a51c479c133d310c8bac
Not a bug but a feature request :)
See: http://github.com/evanphx/rubinius/issues/issue/201
Since I'm using bundler to manage my gems in one of my project, the require 'rubygems' at the top of the em-http.rb is not welcomed at all, since system gems then take precedence over the gems managed by bundler (like, say, eventmachine-0.12.8 instead of 0.12.10 which causes an error since pending_connect_timeout= is a new method).
Putting it in another way (and I think you're well aware of it :-)), you should not suppose your users will use rubygems as their packaging system.
Hey Ilya,
I saw that em-http-request actually supports POST, but I couldn't find any information in the doc to see whether I had to "create" the body myself, or if I can supply a a hash for example, that will be converted into the right formatted body?
Hey, great lib - really nice!
One thing, the method in the readme of
http = EventMachine::HttpRequest.new(uri, {
'X-Title-UUID' => Server.config.TitleUUID,
'X-SC-Tmx' => hmac(uri, query)
}).post(:body => query)
Does not work. You have to use this form:
http = EventMachine::HttpRequest.new(uri).post(
:head => {
'X-Title-UUID' => Server.config.TitleUUID,
'X-SC-Tmx' => hmac(uri, query)
},
:body => query
)
Looks like the problem is in http://github.com/igrigorik/em-http-request/blob/master/lib/em-http/request.rb
The @headers variable is set in initialize() but never used. All that's ever referenced is @options[:head]
Thanks again for the great lib.
-Nate
close_connection does not have any effect while streaming chunks of data, making bailing early impossible.
Here's a tiny patch that adds start_time to HttpClient, so in the callback or errback, you can determine how long the request took
From 92d0a7b7ff11d67ad242eca492c83fe394b92249 Mon Sep 17 00:00:00 2001
From: Paul Barry <[email protected]>
Date: Fri, 8 Jan 2010 22:52:29 -0500
Subject: [PATCH] Added start_time to HttpClient
---
lib/em-http/client.rb | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)
diff --git a/lib/em-http/client.rb b/lib/em-http/client.rb
index e7f7764..0363a01 100644
--- a/lib/em-http/client.rb
+++ b/lib/em-http/client.rb
@@ -186,7 +186,7 @@ module EventMachine
CRLF="\r\n"
attr_accessor :method, :options, :uri
- attr_reader :response, :response_header, :errors
+ attr_reader :response, :response_header, :errors, :start_time
def post_init
@parser = HttpClientParser.new
@@ -198,6 +198,7 @@ module EventMachine
@errors = ''
@content_decoder = nil
@stream = nil
+ @start_time = Time.now
@state = :response_header
end
--
1.6.5.2
I've got a program that sorts through a given text document and pulls out all the URIs using URI.extract, and then does a multi.add(EventMachine::HttpRequest.new(url).get) for each urls its found and does some processing.
This works great, except sometimes the urls are poorly formatted such as 'http:/...' or 'http://cnn.com' (no patch on that last one since no trailing '/'). I'd like to just have it log the error, and keep on going with the order ulrs, but execution falls to the EventMachine.error_handler, and there doesn't seem to be a notion of a "keep going" command once its fallen to that point. Whats the best way to handle issues like this using the em-http-request / multirequest gems?
em_http_example.rb:
require 'rubygems' gem 'em-http-request', '0.2.10' require 'em-http' def make_http_request(url) EventMachine.run do http = EventMachine::HttpRequest.new(url).get http.callback { EventMachine.stop; return http } end end make_http_request("http://example.com/")
Output:
ruby-1.8.7-p299 โ ~ ruby em_http_example.rb /Users/mmarston/.rvm/gems/ruby-1.8.7-p299/gems/eventmachine-0.12.10/lib/eventmachine.rb:1425:in `event_callback': recieved ConnectionUnbound for an unknown signature: 2 (EventMachine::ConnectionNotBound) from /Users/mmarston/.rvm/gems/ruby-1.8.7-p299/gems/eventmachine-0.12.10/lib/eventmachine.rb:263:in `release_machine' from /Users/mmarston/.rvm/gems/ruby-1.8.7-p299/gems/eventmachine-0.12.10/lib/eventmachine.rb:263:in `run' from em_http_example.rb:6:in `make_http_request' from em_http_example.rb:12
HTTP proxies sometimes can skip the initial CONNECT hand shake by sending the absolute Request-URI instead of just its absolute path on the Request-Line, which can greatly help performance in some applications.
I would love to see this ability make it into em-http-request, and would be happy to take a stab at patch/branch.
See also http://www.w3.org/Protocols/rfc2616/rfc2616-sec5.html#sec5.1.2
Using your example from oauth_tweet.rb, I've been stumbling on a few http error 0 (nothing returned) seemingly sporadically. People over at twitter tell me there doesn't seem to be anything wrong with their end.
I've expanded the code quite a bit, but it seems to happen on no particular twitter request. Here's an example of code where the error seems to happen quite often:
request = EventMachine::HttpRequest.new("http://api.twitter.com/statuses/home_timeline.json")
http = request.get(
:body => {'since_id' => msg['since_id'].to_i},
:head => {"Content-Type" => "application/x-www-form-urlencoded"}) do |client|
twitter_oauth_consumer.sign!(client, twitter_oauth_access_token(user.access_token, user.secret_token))
end
I'm not sure if it's due to the oauth library of the em-http-request library... Thoughts?
Background: Using ruby and em-http-request as a bridge between a streaming HTTP interface and twitter. The stream decodes XML and your gem does so well. However, after posting a message to twitter, the streaming seems to stop, but only on some occasions. If there is an error with one request, does it stop others? I will look into it deeper, finding some reproducible, but I wanted to bring it up here.
when iterating though the multi.responses[:failed] array, I realized that connections that resulted in an "unable to resolve server address have nil for the uri field.
That way I don't really know WHICH URL crashed without doing some additional things before and after that should probably belong in the framework
[12:47pm] raggi: that error is raised synchronously atm
[12:47pm] raggi: mind you, i suppose the libs api might not be
[12:49pm] rb2k: hm, so there is basically no way for me to know which url crashed?
[12:49pm] rb2k: unless I save them to an array before going through them
[12:50pm] rb2k: and compare which ones came back
[12:50pm] rb2k: (it's a multi-request)
[12:50pm] rb2k: (em-http)
[12:50pm] raggi: i don't know em-http i'm afraid
[12:51pm] raggi: i could, and maybe should, modify the error that eventmachine raises in em.cpp, to contain the address handed to it
[12:51pm] raggi: but, that's not going to fix the ruby side api in em-http
Since em-http-request now works with rubinius (yay!), it would be nice if there was jruby compatibility too.
There already is a fork at http://github.com/jedediah/em-http-request/commits/master
Since em-http-request will probably be the official http client for eventmachine in the near future, keeping it cross-vm would be awesome! :)
thanks for your work btw!
Hi there,
is it possible to extend em-http-request to support POST&PUT streaming? The reason is to support larger uploads, e.g. 2GB, useful e.g. when uploading files to S3 or attachments to CouchDB.
Thanks,
Michal
From: http://github.com/igrigorik/em-http-request/issues/3/find?comment=82222
An http response can have more than one Set-Cookie header, but only the last is available through response_header.
Ilya,
I hope you had a safe trip back :) It was good to see you again.
I found a weird behavior with that feed : 'http://news.ycombinator.com/rss'
Any idea what's going on with that?
Julien
Hey Ilya,
Any plan to support auto-redirect upon 3XX? That would be neat ;)
Julien
Hi, I am trying to use this library for a personal project for which I need to login on a given website.
My problem is that when the header is parsed all the SET_COOKIE fields are lost except for the last one and the problem seems to be in the parser itself.
Hey,
It seems that this url http://stackoverflow.com:80/feeds/tag/git (or any stackoverflow url as a matter of facts) generate Content decoder errors.
Any idea why?
Thanks!
Any 'Authorization' header is automatically being Base64 encoded. While this is correct for Basic HTTP Authentication, it breaks OAuth requests, which require a number of parameters in the 'Authorization' header (unencoded).
http://github.com/godfat/em-http-request/commit/d701e387095736af461ec86e08623fab5484dc71
This would be minor, and depends on:
http://github.com/godfat/em-http-request/commit/07fd595fc1d42de90f4bafb049de23497fc17862
Thanks a lot for considering.
Invalid request path (ArgumentError) occurs after first redirect when handling a get request to "ebay.co.uk" with :redirects > 0 and a get request to "google.com" as well as many other addresses do not seem to redirect correctly.
I'm not 100% certain, but it seems like bad redirects might be able to crash em-http-request?
These are the errors I've seen so far:
/usr/local/lib/ruby/gems/1.9.1/gems/addressable-2.1.2/lib/addressable/uri.rb:2043:in validate': Hostname not supplied: 'http:/' (Addressable::URI::InvalidURIError) from /usr/local/lib/ruby/gems/1.9.1/gems/addressable-2.1.2/lib/addressable/uri.rb:1072:in
port='
from /usr/local/lib/ruby/gems/1.9.1/gems/em-http-request-0.2.10/lib/em-http/http_options.rb:30:in initialize' from /usr/local/lib/ruby/gems/1.9.1/gems/em-http-request-0.2.10/lib/em-http/client.rb:403:in
new'
from /usr/local/lib/ruby/gems/1.9.1/gems/em-http-request-0.2.10/lib/em-http/client.rb:403:in unbind' from /usr/local/lib/ruby/gems/1.9.1/gems/eventmachine-0.12.10/lib/eventmachine.rb:1417:in
event_callback'
from /usr/local/lib/ruby/gems/1.9.1/gems/eventmachine-0.12.10/lib/eventmachine.rb:256:in run_machine' from /usr/local/lib/ruby/gems/1.9.1/gems/eventmachine-0.12.10/lib/eventmachine.rb:256:in
run'
and this one (redirects to Location: http:\www.ceasturias.com/)
/usr/local/lib/ruby/gems/1.9.1/gems/addressable-2.1.2/lib/addressable/uri.rb:2043:in `validate': Hostname not supplied: 'http:%5C%5Cwww.ceasturias.com/' (Addressable::URI::InvalidURIError)
from /usr/local/lib/ruby/gems/1.9.1/gems/addressable-2.1.2/lib/addressable/uri.rb:1072:in `port='
from /usr/local/lib/ruby/gems/1.9.1/gems/em-http-request-0.2.10/lib/em-http/http_options.rb:30:in `initialize'
from /usr/local/lib/ruby/gems/1.9.1/gems/em-http-request-0.2.10/lib/em-http/client.rb:403:in `new'
from /usr/local/lib/ruby/gems/1.9.1/gems/em-http-request-0.2.10/lib/em-http/client.rb:403:in `unbind'
from /usr/local/lib/ruby/gems/1.9.1/gems/eventmachine-0.12.10/lib/eventmachine.rb:1417:in `event_callback'
from /usr/local/lib/ruby/gems/1.9.1/gems/eventmachine-0.12.10/lib/eventmachine.rb:256:in `run_machine'
from /usr/local/lib/ruby/gems/1.9.1/gems/eventmachine-0.12.10/lib/eventmachine.rb:256:in `run'
from ev.rb:29:in `<main>'
if you run the following example, you will notice that memory is leaked for every request:
https://gist.github.com/d1d031a960af3d3a9e20
tested with Ruby 1.9, Snow Leopard and the latest gem version of em and em-http-request (0.2.7)
Hey,
We're trying to use extensively EM and the em-http-request lib... it's great, but we're wondering how we could improve even more the throughput. When fetching, the first step is to do a DNS resolution, right? Since it is a network operation, it should be blocking, shouldn't it?
Thanks for the reply!
http://github.com/godfat/em-http-request/commit/07fd595fc1d42de90f4bafb049de23497fc17862
Thanks a lot for considering.
em-http-request seems to disregard the timeout if DNS resolution is not bringing up any results.
When trying to download http://www.redditall.com/robots.txt e.g. , a multi request will just stall for >5s even if I set a timeout of e.g. 2 seconds
Hey,
It's awesome that em-http-request follows redirects. However, there is a small problem, when dealing with redirects to a URL that is not compliant . Here is a trace : https://gist.github.com/cc46591705a953aaf7c4
Basically, when the redirected url doesn't have a valid path (like http://google.com) this fails.
I'm not sure addressable has this, but URI has a normalize method which is pretty handy to clean that up. You could also apply a default path of /, or at least fail gracefully :D
Let me k,ow if you need details!
When trying to submit both a body and a file part in a post this wont happen since em-http-request is using either or logic.
I'm going to create a fork of this to fix it.
How can I determine when a errback fires that it was due to a timeout ?
The client @errors variable seem to consistently hold the "" (empty string) value (success or failure).
Many thanks for em-http-request.
as requested http://twitter.com/jmettraux/status/6049776676
Master has everything we (me and @kennethkalmer) need for one of our experiments
Thanks in advance !
mock.rb is not documented afaik. It would be nice to see an example or two of how to use it in our own tests.
I've tried to create some benchmarking and started to notice that sometimes multi requests will block forever.
Here is the code i was running:
require 'em-http-request'
require 'uri'
def leech(count)
puts "leeching: #{count}"
t = Time.now
EventMachine.run do
multi = EventMachine::MultiRequest.new
count.times do |i|
url = URI.encode("http://www.google.com:80/search?q=word#{i}&cad=h")
multi.add(EventMachine::HttpRequest.new(url).get(:timeout => 5))
end
multi.callback do
p multi.responses[:succeeded].size
p multi.responses[:failed].size
EventMachine.stop
end
end
puts Time.now - t
end
leech 50
sleep 30
leech 150
sleep 30
leech 300
sleep 30
leech 600
And at the same time, i was running this code in separate ruby process to see established connections:
loop {system("netstat -an | grep ESTABLISHED | grep -c :80"); sleep 0.01}
Everything worked fine until i started to leech 600...
Here is partial output of the second script:
before leech 600...
2
2
2
2
2
leech 600 starts..
87
437
437
436
471
508
508
507
503
494
483
497
510
505
486
479
455
428
394
371
352
330
303
238
150
70
44
36
26
24
22
18
16
15
13
8
4
4
4
4
4
3
3
3
3
after 10 minutes still 3 established connections and the benchmark script is still running...
Output of the bench script itself:
leeching: 50
50
0
2.994171
leeching: 150
150
0
12.221699
leeching: 300
35
265
15.342878
leeching: 600
nothing happens here..
It's also possible to see that there were a lot of failures with leech 300 already.
So, for some reason something will die or not respond (OS, router, host i'm leeching from or whatnot).
I had one idea that i could just slice the urls apart and then have some sleeping between leeches. Something like this:
require 'em-http-request'
require 'uri'
require 'enumerator'
urls = []
600.times {|i| urls << URI.encode("http://www.google.com:80/search?q=word#{i}&cad=h")}
urls.each_slice(25) do |bulk_urls|
EventMachine.run do
multi = EventMachine::MultiRequest.new
bulk_urls.each {|url| multi.add(EventMachine::HttpRequest.new(url).get(:timeout => 5))}
multi.callback do
p multi.responses[:succeeded].size
p multi.responses[:failed].size
EventMachine.stop
end
end
sleep 0.5 # little sleeping doesn't harm
end
Again, it hangs. Output of the first script:
25
0
25
0
25
0
25
0
25
0
25
0
hangs here... so after leeching ~6*25 times...
And the netstat output like this:
3
20
3
28
4
3
28
5
3
25
4
3
21
4
3
16
3
26
16
16
16
... # 16 all the time
16
16
16
still continuing at 16...
I would like to suggest that there should be possible to set some kind of a bulk threshold limit for multirequests to avoid killing anything and still making it possible to make multiple requests.
It seems that i'm unable to get it constantly working currently...
Also, i'm using:
ruby 1.8.6 (2010-02-04 patchlevel 398) [i386-mingw32]
Tried these scripts on Win7 and Windows XP and got similar results.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.