Giter Club home page Giter Club logo

geventhttpclient's Introduction

GitHub Workflow CI Status PyPI Python Version from PEP 621 TOML PyPI - Downloads

geventhttpclient

A high performance, concurrent HTTP client library for python using gevent.

gevent.httplib support for patching http.client was removed in gevent 1.0, geventhttpclient now provides that missing functionality.

geventhttpclient uses a fast http parser, written in C.

geventhttpclient has been specifically designed for high concurrency, streaming and support HTTP 1.1 persistent connections. More generally it is designed for efficiently pulling from REST APIs and streaming APIs like Twitter's.

Safe SSL support is provided by default. geventhttpclient depends on the certifi CA Bundle. This is the same CA Bundle which ships with the Requests codebase, and is derived from Mozilla Firefox's canonical set.

Since version 2.3, geventhttpclient features a largely requests compatible interface. It covers basic HTTP usage including cookie management, form data encoding or decoding of compressed data, but otherwise isn't as feature rich as the original requests. For simple use-cases, it can serve as a drop-in replacement.

import geventhttpclient as requests
requests.get("https://github.com").text
requests.post("http://httpbingo.org/post", data="asdfasd").json()

from geventhttpclient import Session
s = Session()
s.get("http://httpbingo.org/headers").json()
s.get("https://github.com").content

This interface builds on top of the lower level HTTPClient.

from geventhttpclient import HTTPClient
from geventhttpclient.url import URL

url = URL("http://gevent.org/")
client = HTTPClient(url.host)
response = client.get(url.request_uri)
response.status_code
body = response.read()
client.close()

httplib compatibility and monkey patch

geventhttpclient.httplib module contains classes for drop in replacement of http.client connection and response objects. If you use http.client directly you can replace the httplib imports by geventhttpclient.httplib.

# from http.client import HTTPConnection
from geventhttpclient.httplib import HTTPConnection

If you use httplib2, urllib or urllib2; you can patch httplib to use the wrappers from geventhttpclient. For httplib2, make sure you patch before you import or the super() calls will fail.

import geventhttpclient.httplib
geventhttpclient.httplib.patch()

import httplib2

High Concurrency

HTTPClient has a connection pool built in and is greenlet safe by design. You can use the same instance among several greenlets. It is the low level building block of this library.

import gevent.pool
import json

from geventhttpclient import HTTPClient
from geventhttpclient.url import URL


# go to http://developers.facebook.com/tools/explorer and copy the access token
TOKEN = "<MY_DEV_TOKEN>"

url = URL("https://graph.facebook.com/me/friends", params={"access_token": TOKEN})

# setting the concurrency to 10 allow to create 10 connections and
# reuse them.
client = HTTPClient.from_url(url, concurrency=10)

response = client.get(url.request_uri)
assert response.status_code == 200

# response comply to the read protocol. It passes the stream to
# the json parser as it's being read.
data = json.load(response)["data"]

def print_friend_username(client, friend_id):
    friend_url = URL(f"/{friend_id}", params={"access_token": TOKEN})
    # the greenlet will block until a connection is available
    response = client.get(friend_url.request_uri)
    assert response.status_code == 200
    friend = json.load(response)
    if "username" in friend:
        print(f"{friend['username']}: {friend['name']}")
    else:
        print(f"{friend['name']} has no username.")

# allow to run 20 greenlet at a time, this is more than concurrency
# of the http client but isn't a problem since the client has its own
# connection pool.
pool = gevent.pool.Pool(20)
for item in data:
    friend_id = item["id"]
    pool.spawn(print_friend_username, client, friend_id)

pool.join()
client.close()

Streaming

geventhttpclient supports streaming. Response objects have a read(n) and readline() method that read the stream incrementally. See examples/twitter_streaming.py for pulling twitter stream API.

Here is an example on how to download a big file chunk by chunk to save memory:

from geventhttpclient import HTTPClient, URL

url = URL("http://127.0.0.1:80/100.dat")
client = HTTPClient.from_url(url)
response = client.get(url.query_string)
assert response.status_code == 200

CHUNK_SIZE = 1024 * 16 # 16KB
with open("/tmp/100.dat", "w") as f:
    data = response.read(CHUNK_SIZE)
    while data:
        f.write(data)
        data = response.read(CHUNK_SIZE)

Benchmarks

The benchmark runs 10000 GET requests against a local nginx server in the default configuration with a concurrency of 10. See benchmarks folder. The requests per second for a couple of popular clients is given in the table below. Please read benchmarks/README.md for more details. Also note, HTTPX is better be used with asyncio, not gevent.

HTTP Client RPS
GeventHTTPClient 7268.9
Httplib2 (patched) 2323.9
Urllib3 2242.5
Requests 1046.1
Httpx 770.3

Linux(x86_64), Python 3.11.6 @ Intel i7-7560U

License

This package is distributed under the MIT license. Previous versions of geventhttpclient used http_parser.c, which in turn was based on http/ngx_http_parse.c from NGINX, copyright Igor Sysoev, Joyent, Inc., and other Node contributors. For more information, see http://github.com/joyent/http-parser

geventhttpclient's People

Contributors

amorgun avatar bollwyvl avatar cloudaice avatar cyberw avatar flyingbutter avatar graingert avatar gwik avatar heyman avatar jimmyr avatar joshblum avatar krallin avatar lichray avatar llabatut avatar lucidfrontier45 avatar magupov avatar methane avatar mgiessing avatar miedzinski avatar ml31415 avatar monsterxx03 avatar northisup avatar ojomio avatar rmohr avatar sandrotosi avatar sbraz avatar scarabeusiv avatar strakh avatar thanethomson avatar timclicks avatar timgates42 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

geventhttpclient's Issues

SSL fails to validate host.

On running this code (variation of sample in README), I am successfully able to retrieve data from a host I know to use an untrusted CA, without any errors or exception:

#!/usr/bin/python

from geventhttpclient import HTTPClient
from geventhttpclient.url import URL

url = URL('https://www.mail.ufu.br/')

http = HTTPClient(url.host)

# issue a get request
response = http.get(url.path)

# read status_code
response.status_code

# read response body
body = response.read()
print body

# close connections
http.close()

How am I supposed to know that the host trusted?

Robust proxy support & suggestion

Hello, I'm interested in adding complete proxy support for geventhttpclient. That would include SOCKS 4, SOCKS 4A, SOCKS 5, and of course HTTP proxies.

I'd like to integrate the SocksiPy module into it, specifically this branch that I maintain: https://github.com/Anorov/PySocks/

I've confirmed that the code plays nice with gevent when monkey patched with gevent.monkey.patch_all(). Do you know if it would be worth the effort to replace all references to socket in PySocks with gevent.socket instead of monkey patching, or does monkey patching do it all just fine with no real speed concerns?

I could also just try to convert all the proxy handling code to C, but I'm not sure if it would be worth it. Tell me your thoughts about that. Do you think the current branch would get a decent speed-up if it was written in C, or should I just integrate it with geventhttpclient as is?

Anyway, I think a new class in connectionpool.py subclassing ConnectionPool, maybe called ProxiedConnectionPool, could be used.

_create_socket() could probably be overridden to do something like this:

sock_info = self._resolve()
sock = socks.socksocket()
sock.setproxy(some passed in proxy arg, probably passed to HTTPClient initially)
sock.connect(sock_info[-1])
return sock

What do you think? Just wanted to get some input from you before I start modifying a fork.

Add prebuilt wheels to PyPI

Would you consider uploading prebuilt python wheels to PyPI (for Linux, macOS & Windows)?

I have a branch in my fork where I've managed to get Appveyor to automatically build windows wheels for geventhttpclient with the use of cibuildwheels. It was actually pretty easy to set up: heyman@5d21a29

Here's the Appveyor build: https://ci.appveyor.com/project/heyman/geventhttpclient

I can improve on that branch and get it to also build wheels for Linux and macOS on Travis.ci. The only thing you would need to do is then to download the prebuilt wheels and upload them to PyPI when making a new release. Would you be interested in such pull-request?

The reason I'm interested in this is because I'm working on a new HTTP client for Locust that'll use geventhttpclient, and we don't want to add a dependency on having a build environment set up for people who want to install Locust: locustio/locust#713

Edit:
It's also possible to set up Travis.ci and Appveyor to automatically upload the built wheel files when a new git tag is pushed. If one is willing to trust Travis & Appveyor with ones PyPI credentials, or at least credentials to an account that has upload access to the PyPI repo.

Attribute 'addinfourl.headers' doesn't hold actual headers

After monkey patching, urllib2 responses lose headers:

>>> import geventhttpclient.httplib
>>> geventhttpclient.httplib.patch()
>>> import urllib2
>>> resp = urllib2.urlopen('http://icanhazip.com/')
>>> resp.headers
'OK'

It happens because geventhttpclient.response.HTTPResponse.msg is a plain string -- reason, whether httplib.HTTPResponse.msg is an instance of httplib.HTTPMessage, which is also a mapping that maps header fields to their values.

"invalid character in header" Exception with some websites

for example http://papileon.at/
Any web browser can open it with no problem.

But...

ua = UserAgent()

r=ua.urlopen("http://papileon.at/")
Traceback (most recent call last):
File "", line 1, in
File "/usr/lib64/python2.7/site-packages/geventhttpclient/useragent.py", line 332, in urlopen
e = self._handle_error(e, url=req.url)
File "/usr/lib64/python2.7/site-packages/geventhttpclient/useragent.py", line 327, in urlopen
resp = self._urlopen(req)
File "/usr/lib64/python2.7/site-packages/geventhttpclient/useragent.py", line 274, in _urlopen
body=request.payload, headers=request.headers)
File "/usr/lib64/python2.7/site-packages/geventhttpclient/client.py", line 167, in request
block_size=self.block_size, method=method.upper(), headers_type=self.headers_type)
File "/usr/lib64/python2.7/site-packages/geventhttpclient/response.py", line 277, in init
super(HTTPSocketPoolResponse, self).init(sock, **kw)
File "/usr/lib64/python2.7/site-packages/geventhttpclient/response.py", line 149, in init
self._read_headers()
File "/usr/lib64/python2.7/site-packages/geventhttpclient/response.py", line 170, in _read_headers
self.feed(data)
_parser.HTTPParseError: ('invalid character in header', 21)

LoopExit: This operation would block forever

Please, help me understand (and possibly fix) what is happening here... I have a Django application that makes POST requests to another local application, when treating specific user requests. I am trying to use geventhttpclient to do the job, but I can't understand what is wrong. In the first request Django makes, everything goes right. In the second, I get this exception "LoopExit: This operation would block forever":

Traceback (most recent call last):
  File "/home/lucas/projetos/diggems/cpython_env/local/lib/python2.7/site-packages/django/core/handlers/base.py", line 115, in get_response
    response = callback(request, *callback_args, **callback_kwargs)
  File "/home/lucas/projetos/diggems/cpython_env/local/lib/python2.7/site-packages/django/db/transaction.py", line 223, in inner
    return func(*args, **kwargs)
  File "/home/lucas/projetos/diggems/game/views.py", line 350, in move
    post_update(game.channel, result)
  File "/home/lucas/projetos/diggems/game/game_helpers.py", line 121, in post_update
    with conn.post('/ctrl_event/' + channel, msg, headers={'Content-Type': 'text/plain'}) as req:
  File "/home/lucas/projetos/diggems/cpython_env/local/lib/python2.7/site-packages/geventhttpclient/client.py", line 158, in post
    return self.request('POST', request_uri, body=body, headers=headers)
  File "/home/lucas/projetos/diggems/cpython_env/local/lib/python2.7/site-packages/geventhttpclient/client.py", line 143, in request
    block_size=self.block_size, method=method.upper(), headers_type=self.headers_type)
  File "/home/lucas/projetos/diggems/cpython_env/local/lib/python2.7/site-packages/geventhttpclient/response.py", line 277, in __init__
    super(HTTPSocketPoolResponse, self).__init__(sock, **kw)
  File "/home/lucas/projetos/diggems/cpython_env/local/lib/python2.7/site-packages/geventhttpclient/response.py", line 149, in __init__
    self._read_headers()
  File "/home/lucas/projetos/diggems/cpython_env/local/lib/python2.7/site-packages/geventhttpclient/response.py", line 169, in _read_headers
    data = self._sock.recv(self.block_size)
  File "/home/lucas/projetos/diggems/cpython_env/local/lib/python2.7/site-packages/gevent/socket.py", line 392, in recv
    self._wait(self._read_event)
  File "/home/lucas/projetos/diggems/cpython_env/local/lib/python2.7/site-packages/gevent/socket.py", line 298, in _wait
    self.hub.wait(watcher)
  File "/home/lucas/projetos/diggems/cpython_env/local/lib/python2.7/site-packages/gevent/hub.py", line 348, in wait
    result = waiter.get()
  File "/home/lucas/projetos/diggems/cpython_env/local/lib/python2.7/site-packages/gevent/hub.py", line 575, in get
    return self.hub.switch()
  File "/home/lucas/projetos/diggems/cpython_env/local/lib/python2.7/site-packages/gevent/hub.py", line 338, in switch
    return greenlet.switch(self)
LoopExit: This operation would block forever

Where about file "/home/lucas/projetos/diggems/game/game_helpers.py", line 121, there is:

def post_update(channel, msg):
    conn = http_cli.get_conn("http://127.0.0.1:8080/")
    with conn.post('/ctrl_event/' + channel, msg, headers={'Content-Type': 'text/plain'}) as req:
        resp = req.read()

and this http_cli module is:

from geventhttpclient import HTTPClient

_pool = {}

def get_conn(base_url):
    try:
        return _pool[base_url]
    except KeyError:
        cli = HTTPClient.from_url(base_url, connection_timeout=60, network_timeout=60, concurrency=5)
        _pool[base_url] = cli
        return cli

Is there anything wrong with this usage? I thought in having this pool dictionary, that will cache the HTTPClients to the servers I must connect to in my application...

So, do you have any clues on what may be causing that dread exception "LoopExit: This operation would block forever"? Is it the fact that I am using the Django development server, that is gevent unaware?

Segmentation fault

from geventhttpclient import HTTPClient
from geventhttpclient.url import URL

baseUrl = URL("http://www.abfall.mobi")
http = HTTPClient(host=baseUrl.host)
res = http.get('')

always segmentation fault....

====
(.venv)MacBook-Pro:sc Dave$ python3
Python 3.6.0 (default, Dec 24 2016, 08:02:28)
[GCC 4.2.1 Compatible Apple LLVM 7.0.2 (clang-700.1.81)] on darwin
Type "help", "copyright", "credits" or "license" for more information.

from geventhttpclient import HTTPClient
from geventhttpclient.url import URL

baseUrl = URL("http://www.abfall.mobi")
http = HTTPClient(host=baseUrl.host)
res = http.get('')
Segmentation fault: 11

Tests fail with django installed because of gevent.monkey.patch_all()

Hi, it looks like tests fail with django installed, as reported here: https://bugs.gentoo.org/659372
It was quite easy to reproduce and seems to be a problem with pytest. Creating a file containing only this:

import gevent.monkey
gevent.monkey.patch_all()

and running py.test_monkey.py also triggers an error.

Here is the trace from geventhttpclient's tests:

=============================================================================================== test session starts ===============================================================================================
platform linux -- Python 3.6.5, pytest-3.4.1, py-1.5.3, pluggy-0.6.0
rootdir: /home/sbraz/test/geventhttpclient, inifile:
plugins: requests-mock-1.5.0, mock-1.6.3, localserver-0.3.7, httpbin-0.2.3, hypothesis-3.59.1, backports.unittest-mock-1.3                                                                                        
collected 72 items / 1 errors                                                                                                                                                                                     

===================================================================================================== ERRORS ======================================================================================================
___________________________________________________________________________ ERROR collecting src/geventhttpclient/tests/test_headers.py ___________________________________________________________________________
src/geventhttpclient/tests/test_headers.py:5: in <module>
    gevent.monkey.patch_all()
/usr/lib64/python3.6/site-packages/gevent/monkey.py:611: in patch_all
    patch_thread(Event=Event, _warnings=_warnings)
/usr/lib64/python3.6/site-packages/gevent/monkey.py:348: in patch_thread
    _patch_existing_locks(threading_mod)
/usr/lib64/python3.6/site-packages/gevent/monkey.py:263: in _patch_existing_locks
    if isinstance(o, rlock_type):
../../.local/lib64/python3.6/site-packages/django/utils/functional.py:215: in inner
    self._setup()
../../.local/lib64/python3.6/site-packages/django/conf/__init__.py:41: in _setup
    % (desc, ENVIRONMENT_VARIABLE))
E   django.core.exceptions.ImproperlyConfigured: Requested settings, but settings are not configured. You must either define the environment variable DJANGO_SETTINGS_MODULE or call settings.configure() before accessing settings.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 1 errors during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
============================================================================================= 1 error in 0.19 seconds =============================================================================================

win32 compile fails

Encountered while testing on win32 2.7 Appveyor builds (https://ci.appveyor.com/project/jayvdb/anyhttp/build/1.0.36/job/7ebd46rall5autpi)

Searching for geventhttpclient
Best match: geventhttpclient [unknown version]
Doing git clone from https://github.com/gwik/geventhttpclient to c:\users\appveyor\appdata\local\temp\1\easy_install-tsr57v\geventhttpclient
Processing geventhttpclient
Writing c:\users\appveyor\appdata\local\temp\1\easy_install-tsr57v\geventhttpclient\setup.cfg
Running setup.py -q bdist_egg --dist-dir c:\users\appveyor\appdata\local\temp\1\easy_install-tsr57v\geventhttpclient\egg-dist-tmp-9wyj14
_parser.c
ext/_parser.c(188) : error C2275: 'size_t' : illegal use of this type as an expression 
        c:\program files (x86)\common files\microsoft\visual c++ for python\9.0\vc\include\codeanalysis\sourceannotations.h(19) : see declaration of 'size_t' 
ext/_parser.c(188) : error C2146: syntax error : missing ';' before identifier 'nread' 
ext/_parser.c(188) : error C2065: 'nread' : undeclared identifier 
ext/_parser.c(192) : error C2275: 'PyObject' : illegal use of this type as an expression 
        c:\python27\include\object.h(108) : see declaration of 'PyObject' 
ext/_parser.c(192) : error C2065: 'exception' : undeclared identifier 
ext/_parser.c(193) : error C2065: 'exception' : undeclared identifier 
ext/_parser.c(193) : warning C4047: '!=' : 'int' differs in levels of indirection from 'void *' 
ext/_parser.c(199) : error C2065: 'nread' : undeclared identifier 
ext/_parser.c(334) : error C2275: 'PyObject' : illegal use of this type as an expression 
        c:\python27\include\object.h(108) : see declaration of 'PyObject' 
ext/_parser.c(334) : error C2065: 'httplib' : undeclared identifier 
ext/_parser.c(335) : error C2275: 'PyObject' : illegal use of this type as an expression 
        c:\python27\include\object.h(108) : see declaration of 'PyObject' 
ext/_parser.c(335) : error C2065: 'HTTPException' : undeclared identifier 
ext/_parser.c(335) : error C2065: 'httplib' : undeclared identifier 
ext/_parser.c(335) : warning C4047: 'function' : 'PyObject *' differs in levels of indirection from 'int' 
ext/_parser.c(335) : warning C4024: 'PyObject_GetAttrString' : different types for formal and actual parameter 1 
ext/_parser.c(338) : error C2065: 'HTTPException' : undeclared identifier 
ext/_parser.c(338) : warning C4047: 'function' : 'PyObject *' differs in levels of indirection from 'int' 
ext/_parser.c(338) : warning C4024: 'PyErr_NewException' : different types for formal and actual parameter 2 
error: Setup script exited with error: command 'C:\\Program Files (x86)\\Common Files\\Microsoft\\Visual C++ for Python\\9.0\\VC\\Bin\\cl.exe' failed with exit status 2
Command exited with code 1

hostname verification

As far as I can tell, you’re verifying certificates but not whether they belong to the hostname you connect to. That makes the verification of certificates more or less worthless because I could simply pretend to be Google.com as long as the certificate I present is valid (i.e. even with the certificate for my own homepage).

You don’t have to implement it yourself; there’s a backport from the standard library: https://pypi.python.org/pypi/backports.ssl_match_hostname/

Cannot set proxy

  File "test_geventhttp.py", line 17, in <module>
    http = HTTPClient.from_url(url, concurrency=10, proxy_host='asd', proxy_port=80)
  File "C:\Python27\lib\site-packages\geventhttpclient-1.0a-py2.7-win32.egg\geventhttpclient\client.py", line 29, in from_url
    return HTTPClient(url.host, port=url.port, ssl=enable_ssl, **kw)
  File "C:\Python27\lib\site-packages\geventhttpclient-1.0a-py2.7-win32.egg\geventhttpclient\client.py", line 46, in __init__
    connection_host = self.proxy_host
AttributeError: 'HTTPClient' object has no attribute 'proxy_host'

That is happening because of this initialization of the proxy:

class HTTPClient(object):
def __init__(self, host, port=None, headers={},
            block_size=BLOCK_SIZE,
            connection_timeout=ConnectionPool.DEFAULT_CONNECTION_TIMEOUT,
            network_timeout=ConnectionPool.DEFAULT_NETWORK_TIMEOUT,
            disable_ipv6=False,
            concurrency=1, ssl_options=None, ssl=False,
            proxy_host=None, proxy_port=None, version=HTTP_11):
        self.host = host
        self.port = port
        connection_host = self.host
        connection_port = self.port
        if proxy_host is not None:
            assert proxy_port is not None, \
                'you have to provide proxy_port if you have set proxy_host'
            self.use_proxy = True
            connection_host = self.proxy_host
            connection_port = self.proxy_port

_parser.HTTPParseError: ('invalid constant string')

Hi,

I'm using geventhttpclient against multiple different web servers. But there is
one where I got an error when geventhttpclient try to parse the HTTP Response from the server.

File "/usr/local/lib/python2.7/dist-packages/geventhttpclient/client.py", line 156, in request
block_size=self.block_size, method=method.upper())
File "/usr/local/lib/python2.7/dist-packages/geventhttpclient/response.py", line 259, in init
super(HTTPSocketPoolResponse, self).init(sock, **kw)
File "/usr/local/lib/python2.7/dist-packages/geventhttpclient/response.py", line 136, in init
self._read_headers()
File "/usr/local/lib/python2.7/dist-packages/geventhttpclient/response.py", line 158, in _read_headers
self.feed(data)
_parser.HTTPParseError: ('invalid constant string', 27)

When printting that data just before the self.feed I got that and it seems there no headers at all:

    <html>
    <head><title>400 Bad Request</title></head>  
    <body bgcolor="white"> 
    <center><h1>400 Bad Request</h1></center>
    <hr><center>nginx/0.8.55</center>
    </body>
    </html>

Anyway is there someone that could point me on how I can debug that ?

POST requests fail if connection lost

POST requests (with a body) fail if you re-use a HTTP connections, and the connection has gone away.

Surely the line 167,168 of client.py:

        if body:
            sock.sendall(body)

should be inserted into the try above, between lines 159 and 160 instead?

Otherwise, this is the result:

File "jim.py", line 18, in frog
response = http.post(url.request_uri,body="jim")
File "/opt/fds/pylib/geventhttpclient/client.py", line 191, in post
return self.request(METHOD_POST, request_uri, body=body, headers=headers)
File "/opt/fds/pylib/geventhttpclient/client.py", line 168, in request
sock.sendall(body)
File "/opt/fds/pylib/gevent-1.1b6-py2.7-linux-x86_64.egg/gevent/_socket2.py", line 395, in sendall
timeleft = self.__send_chunk(chunk, flags, timeleft, end)
File "/opt/fds/pylib/gevent-1.1b6-py2.7-linux-x86_64.egg/gevent/_socket2.py", line 343, in __send_chunk
data_sent += self.send(chunk, flags, timeout=timeleft)
File "/opt/fds/pylib/gevent-1.1b6-py2.7-linux-x86_64.egg/gevent/_socket2.py", line 310, in send
return sock.send(data, flags)
error: [Errno 32] Broken pipe

No Pip Package Published

colin@t410-clr-l:~/Dropbox/Repos/bitHopper/bitHopper$ sudo pip install geventhttpclient
Downloading/unpacking geventhttpclient
Exception in thread Thread-117:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 552, in *bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 505, in run
self.__target(_self.__args, _self.__kwargs)
File "/usr/lib/python2.7/dist-packages/pip/index.py", line 239, in _get_queued_page
page = self._get_page(location, req)
File "/usr/lib/python2.7/dist-packages/pip/index.py", line 324, in _get_page
return HTMLPage.get_page(link, req, cache=self.cache)
File "/usr/lib/python2.7/dist-packages/pip/index.py", line 441, in get_page
resp = urlopen(url)
File "/usr/lib/python2.7/dist-packages/pip/download.py", line 83, in __call

response = urllib2.urlopen(self.get_request(url))
File "/usr/lib/python2.7/urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
File "/usr/lib/python2.7/urllib2.py", line 394, in open
response = self._open(req, data)
File "/usr/lib/python2.7/urllib2.py", line 412, in _open
'_open', req)
File "/usr/lib/python2.7/urllib2.py", line 372, in _call_chain
result = func(_args)
File "/usr/lib/python2.7/urllib2.py", line 1201, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "/usr/lib/python2.7/urllib2.py", line 1168, in do_open
h.request(req.get_method(), req.get_selector(), req.data, headers)
File "/usr/lib/python2.7/httplib.py", line 955, in request
self._send_request(method, url, body, headers)
File "/usr/lib/python2.7/httplib.py", line 989, in _send_request
self.endheaders(body)
File "/usr/lib/python2.7/httplib.py", line 951, in endheaders
self._send_output(message_body)
File "/usr/lib/python2.7/httplib.py", line 811, in _send_output
self.send(msg)
File "/usr/lib/python2.7/httplib.py", line 773, in send
self.connect()
File "/usr/lib/python2.7/httplib.py", line 754, in connect
self.timeout, self.source_address)
File "/usr/local/lib/python2.7/dist-packages/gevent/socket.py", line 632, in create_connection
for res in getaddrinfo(host, port, 0, SOCK_STREAM):
File "/usr/local/lib/python2.7/dist-packages/gevent/socket.py", line 754, in getaddrinfo
job = spawn(wrap_errors(gaierror, resolve_ipv6), host, evdns_flags)
File "/usr/local/lib/python2.7/dist-packages/gevent/greenlet.py", line 243, in spawn
g = cls(_args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/gevent/greenlet.py", line 133, in init
greenlet.init(self, parent=get_hub())
File "/usr/local/lib/python2.7/dist-packages/gevent/hub.py", line 135, in get_hub
raise NotImplementedError('gevent is only usable from a single thread')
NotImplementedError: gevent is only usable from a single thread

Could not find any downloads that satisfy the requirement geventhttpclient
No distributions at all found for geventhttpclient
Storing complete log in /home/colin/.pip/pip.log

Add keep_blank_values and strict_parsing support for URL class

Use geventhttpclient.url.URL class to parse url, blank query string has missing in request, keep_blank_values and strict_parsing are urlparse.parse_qs default params, please add support keep_blank_values and strict_parsing pass to urlparse.parse_qs function.

Move to another repository

I'm tend to move geventhttpclient to another repository that I don't own. As well as the release management.
I'm more a bottleneck than I add value to the project anymore.

Would you like it ?
Who would like to maintain it ?

cc @NorthIsUp @ml31415 @heyman @rmohr

Connection leak

If you test with a tool like apache benchmark, usign for example

-c 50 -n 1000

and then check for TIME_WAIT connections with

netstat -na | grep TIME_WAIT

A lot of connection not closed.

Support for redirections and full cookielib compatibility

I'm currently maintaining some compatibility layer for different http implementations. This happend, as I had switched from urllib2, to restkit and now to geventhttpclient with some data collecting software I'm currently developing, as requirements grew. Support for automatic redirections and the ability, to work as a drop-in replacement for restkit as well as urllib2 with support for cookielib could be transfered to geventhttpclient quite easily. I'd implement this as a subclass of the current HTTPclient.

The cookie support for geventhttpclient could also be made by subclassing CookieJar and make adjustments there, but this would probably be more code than the other way round, as CookieJar basically just requires some aliases for already existing functions, and the new headers container is already compatible.

Bug in httplib2 patch

I tried to use httplib2 patch as shown in geventhttpclient.

import geventhttpclient.httplib
geventhttpclient.httplib.patch()

import httplib2

h = httplib2.Http()
h.request("http://api.gdax.com/currencies", "GET")

Here is there error i had:

C:\Users\aqv13\AppData\Local\Programs\Python\Python36-32\python.exe C:/Users/aqv13/PycharmProjects/untitled1/main.py
Traceback (most recent call last):
  File "C:/Users/aqv13/PycharmProjects/untitled1/main.py", line 7, in <module>
    h.request("http://api.gdax.com/currencies", "GET")
  File "C:\Users\aqv13\AppData\Local\Programs\Python\Python36-32\lib\site-packages\httplib2\__init__.py", line 1322, in request
    (response, content) = self._request(conn, authority, uri, request_uri, method, body, headers, redirections, cachekey)
  File "C:\Users\aqv13\AppData\Local\Programs\Python\Python36-32\lib\site-packages\httplib2\__init__.py", line 1124, in _request
    headers=headers, redirections=redirections - 1)
  File "C:\Users\aqv13\AppData\Local\Programs\Python\Python36-32\lib\site-packages\httplib2\__init__.py", line 1204, in request
    self.disable_ssl_certificate_validation)
  File "C:\Users\aqv13\AppData\Local\Programs\Python\Python36-32\lib\site-packages\httplib2\__init__.py", line 858, in __init__
    check_hostname=disable_ssl_certificate_validation ^ True)
  File "C:\Users\aqv13\AppData\Local\Programs\Python\Python36-32\lib\site-packages\geventhttpclient\httplib.py", line 118, in __init__
    HTTPConnection.__init__(self, host, port, **kw)
  File "C:\Users\aqv13\AppData\Local\Programs\Python\Python36-32\lib\site-packages\geventhttpclient\httplib.py", line 95, in __init__
    HTTPLibConnection.__init__(self, *args, **kw)
TypeError: __init__() got an unexpected keyword argument 'context'

Process finished with exit code 1

PyPi Package

Please upload this project as a proper package so that it doesn't have to be installed by hand.

please update pypi package

Hi,
Please update pypi package, pypi geventhttpclient version not match current commit in github .

thank You.

timeout socket

Hi,

First, geventhttpclient is awesome - you get the asynchronous behavior without the need for evented APIs! nice. Sadly, there are some issues still. In one of my use cases I create a HTTPClient like this:

HTTPClient.from_url('http://%s:%s'%(self._host.name, self._host.port),
                        concurrency         = _cvalue,
                        connection_timeout  = _tvalue,
                        network_timeout     = _tvalue)

Initially I was caching the instance of the client, but now I create it every time I issue a request (in the case of the expensive db calls I do in my simple test setup I see almost no differences). However, after _tvalue passes when I create the client again (in subsequent requests) I am getting the following error:

File "/home/torque/tmp/env/lib/python2.7/site-packages/geventhttpclient-1.0a-py2.7-linux-x86_64.egg/geventhttpclient/client.py", line 168, in post
    return self.request('POST', request_uri, body=body, headers=headers)
  File "/home/torque/tmp/env/lib/python2.7/site-packages/geventhttpclient-1.0a-py2.7-linux-x86_64.egg/geventhttpclient/client.py", line 144, in request
    sock = self._connection_pool.get_socket()
  File "/home/torque/tmp/env/lib/python2.7/site-packages/geventhttpclient-1.0a-py2.7-linux-x86_64.egg/geventhttpclient/connectionpool.py", line 96, in get_socket
    return self._create_socket()
  File "/home/torque/tmp/env/lib/python2.7/site-packages/geventhttpclient-1.0a-py2.7-linux-x86_64.egg/geventhttpclient/connectionpool.py", line 82, in _create_socket
    sock.connect(sock_info[-1])
  File "/home/torque/tmp/env/lib/python2.7/site-packages/gevent/socket.py", line 392, in connect
    wait_readwrite(sock.fileno(), timeout=timeleft, event=self._rw_event)
  File "/home/torque/tmp/env/lib/python2.7/site-packages/gevent/socket.py", line 215, in wait_readwrite
    switch_result = get_hub().switch()
  File "/home/torque/tmp/env/lib/python2.7/site-packages/gevent/hub.py", line 164, in switch
    return greenlet.switch(self)
timeout: timed out

am I using geventhttpclient wrongly? how can this be fixed? Thanks a lot!

Cheers,
Cosmin

Bug in geventhttpclient.response.py

Windows, geventhttpclient==1.3.1.
Running the httpclient.py in benchmark raise an AttributeError "'dict' object has no attribute 'add'" in file geventhttpclient/response.py line 145 which want to flush the header. The original code is

    def _flush_header(self):
        if self._current_header_field is not None:
            self._headers_index.add(self._current_header_field,
                                    self._current_header_value)
            self._header_position += 1
            self._current_header_field = None
            self._current_header_value = None

The _headers_index of class HTTPResponse is a geventhttpclient.header.Headers object, it may be fixed as follow when using dict as headers_type to maximum speed

    def _flush_header(self):
        if self._current_header_field is not None:
            key = self._current_header_field.lower()
            self._headers_index[key] = self._headers_index.get(key, _current_header_value)
            self._header_position += 1
            self._current_header_field = None
            self._current_header_value = None

Bad descriptors with master

Traceback (most recent call last):
  File "***", line 285, in request
    return HTTPConnectionPool.request(self, method, url, fields, headers, **urlopen_kw)
  File "***/lib/python2.7/site-packages/urllib3/request.py", line 75, in request
    **urlopen_kw)
  File "***/lib/python2.7/site-packages/urllib3/request.py", line 88, in request_encode_url
    return self.urlopen(method, url, **urlopen_kw)
  File "***/lib/python2.7/site-packages/urllib3/connectionpool.py", line 536, in urlopen
    conn = self._get_conn(timeout=pool_timeout)
  File "***/lib/python2.7/site-packages/urllib3/connectionpool.py", line 294, in _get_conn
    if conn and is_connection_dropped(conn):
  File "***/lib/python2.7/site-packages/urllib3/util.py", line 490, in is_connection_dropped
    return select([sock], [], [], 0.0)[0]
  File "***/lib/python2.7/site-packages/gevent/select.py", line 67, in select
    raise error(*ex.args)
error: (9, 'Bad file descriptor')

Care to provide an explanation for why this happens so I'll be able to provide a fix?

cannot use library when ssl is not installed

Hi,

I often use python running in a ./local folder on a linux box I don't have root/sudo access to. Installing ssl is usually complicated (python needs some manual tinkering) and on top of that I don't need ssl support in python in the first place.
The problem is that when I use geventhttpclient it crashes on import with the classic: "ImportError: No module named _ssl". Ssl is mainly used in the httplib module for the HTTPSConnection.

A solution would be to conditionally import ssl and expose HTTPSConnection if ssl is present, enabling the rest of the functionality even if ssl is not installed. Thanks,

Cheers,
Cosmin

Python 2.7.10 using ssl.get_default_verify_paths() returns directory path rather than full path to cert file

Hey,
I am using Python 2.7.10.
The trouble I am having is related to this part:
https://github.com/gwik/geventhttpclient/blob/master/src/geventhttpclient/connectionpool.py#L4-L15

since

    _certs = get_default_verify_paths()
    _CA_CERTS = _certs.cafile or _certs.capath

returns a directory path (_certs.cafile returns None so it's getting value from _certs.capath) rather than full path to a file - it fails. Currently the only solution I see is to just use certify. Commenting that line where it assigns variable to _CA_CERTS it fall backs to:

import certifi
    _CA_CERTS = certifi.where()

and everything seems to work fine then.

How to construct the post body?

I hava try it many times. I want to post a url and get something back . I find that in client.py file have some methods like
"post","request". they all need a parameter named "body"。but I don't know how to construct this body . I just
have a dict type like this :
data = {"name":"myname","password":"mypassword"}
When I use
urllib.urlencode(data)

to construct the "body",but it does not work. when I use urllib2 to post .it works.

so how can I do ?

CERTIFICATE_VERIFY_FAILED on some sites / certify cacert.pem / get_default_verify_paths

Ubuntu a directory not a file for CAs, so you use certifi.where() in your connectionpool.py code

Can't figure thsi out. All I know is that requests uses a pretty similar if not identical cacert.pem and is having no issues.

if I do

ua =  UserAgent()
ua.urlopen('https://bison.streethawk.com/v1/installs/push_history');

I get ssl.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:590)

ali I can do is disable verification for now.

Like I say I can use requests with no issues, but that is not fit for purpose.

is pypy supported - error on import

Here is how I install PYPY:
wget https://bitbucket.org/pypy/pypy/downloads/pypy-2.5.0-linux64.tar.bz2
tar jxf pypy-2.5.0-linux64.tar.bz2
mv pypy-2.5.0-linux64 /opt/pypy
cd /opt/pypy/site-packages/
wget http://nightly.ziade.org/distribute_setup.py
/opt/pypy/bin/pypy distribute_setup.py
ln -s /opt/pypy/bin/pypy /usr/bin/pypy

/opt/pypy/bin/easy_install greenlet
/opt/pypy/bin/easy_install git+git://github.com/surfly/gevent.git#egg=gevent
/opt/pypy/bin/easy_install geventhttpclient

This is the error I get on ubuntu 14.04

Python 2.7.8 (10f1b29a2bd2, Feb 02 2015, 21:22:43)
[PyPy 2.5.0 with GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.

from geventhttpclient import HTTPClient
Traceback (most recent call last):
File "", line 1, in
File "/opt/pypy/site-packages/geventhttpclient-1.1.0-py2.7-linux-x86_64.egg/geventhttpclient/init.py", line 5, in
from geventhttpclient.client import HTTPClient
File "/opt/pypy/site-packages/geventhttpclient-1.1.0-py2.7-linux-x86_64.egg/geventhttpclient/client.py", line 3, in
from geventhttpclient.response import HTTPSocketPoolResponse
File "/opt/pypy/site-packages/geventhttpclient-1.1.0-py2.7-linux-x86_64.egg/geventhttpclient/response.py", line 2, in
from geventhttpclient._parser import HTTPResponseParser, HTTPParseError #@UnresolvedImport
File "/opt/pypy/site-packages/geventhttpclient-1.1.0-py2.7-linux-x86_64.egg/geventhttpclient/_parser.py", line 7, in
bootstrap()
File "/opt/pypy/site-packages/geventhttpclient-1.1.0-py2.7-linux-x86_64.egg/geventhttpclient/_parser.py", line 6, in bootstrap
imp.load_dynamic(name,file)
ImportError: unable to load extension module '/opt/pypy/site-packages/geventhttpclient-1.1.0-py2.7-linux-x86_64.egg/geventhttpclient/_parser.pypy-25.so': /opt/pypy/site-packages/geventhttpclient-1.1.0-py2.7-linux-x86_64.egg/geventhttpclient/_parser.pypy-25.so: undefined symbol: PyByteArray_FromStringAndSize

Multiple headerlines with same field name reduced to last entry

The current implementation of http headers with a simple dictionary does not support the case, when a page for example wants to set multiple cookies at the same time by sending a header with several "Set-Cookie" entries. This behaviour is standards compatible as far as I know and not that uncommon. Unfortunately the current implementation of HTTPResponse._headers_index is a plain dictionary, which can not hold several entries with the same key.

My suggestion would be, to replace this dictionary with some more flexible container class. I understand, that this is still one of the speed critical parts and any overhead there is less than suboptimal. Nevertheless, there are cases, when functionality and standards compatibility have some higher priority. So it could be a init option for the HTTPClient, which container class to use for the header in the response.

In the same way, also the request builder should be able to handle lists within a dictionary value, in order to create multiple header lines with the same field.

If these changes are welcome, I'll put my local changes together and create a patch for that issue.

Michael

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.