encode / httpx Goto Github PK

View Code? Open in Web Editor NEW

12.7K 112.0 810.0 5.61 MB

A next generation HTTP client for Python. 🦋

Home Page: https://www.python-httpx.org/

License: BSD 3-Clause "New" or "Revised" License

Python 99.52% Shell 0.48%

python http trio asyncio

httpx's Introduction

HTTPX - A next-generation HTTP client for Python.

HTTPX is a fully featured HTTP client library for Python 3. It includes an integrated command line client, has support for both HTTP/1.1 and HTTP/2, and provides both sync and async APIs.

Install HTTPX using pip:

pip install httpx

Now, let's get started:

>>> import httpx
>>> r = httpx.get('https://www.example.org/')
>>> r
<Response [200 OK]>
>>> r.status_code
200
>>> r.headers['content-type']
'text/html; charset=UTF-8'
>>> r.text
'<!doctype html>\n<html>\n<head>\n<title>Example Domain</title>...'

Or, using the command-line client.

pip install 'httpx[cli]'  # The command line client is an optional dependency.

Which now allows us to use HTTPX directly from the command-line...

Sending a request...

Features

HTTPX builds on the well-established usability of requests, and gives you:

A broadly requests-compatible API.
An integrated command-line client.
HTTP/1.1 and HTTP/2 support.
Standard synchronous interface, but with async support if you need it.
Ability to make requests directly to WSGI applications or ASGI applications.
Strict timeouts everywhere.
Fully type annotated.
100% test coverage.

Plus all the standard features of requests...

International Domains and URLs
Keep-Alive & Connection Pooling
Sessions with Cookie Persistence
Browser-style SSL Verification
Basic/Digest Authentication
Elegant Key/Value Cookies
Automatic Decompression
Automatic Content Decoding
Unicode Response Bodies
Multipart File Uploads
HTTP(S) Proxy Support
Connection Timeouts
Streaming Downloads
.netrc Support
Chunked Requests

Installation

Install with pip:

pip install httpx

Or, to include the optional HTTP/2 support, use:

pip install httpx[http2]

HTTPX requires Python 3.8+.

Documentation

Project documentation is available at https://www.python-httpx.org/.

For a run-through of all the basics, head over to the QuickStart.

For more advanced topics, see the Advanced Usage section, the async support section, or the HTTP/2 section.

The Developer Interface provides a comprehensive API reference.

To find out about tools that integrate with HTTPX, see Third Party Packages.

Contribute

If you want to contribute with HTTPX check out the Contributing Guide to learn how to start.

Dependencies

The HTTPX project relies on these excellent libraries:

httpcore - The underlying transport implementation for httpx.
- h11 - HTTP/1.1 support.
certifi - SSL certificates.
idna - Internationalized domain name support.
sniffio - Async library autodetection.

As well as these optional installs:

h2 - HTTP/2 support. (Optional, with httpx[http2])
socksio - SOCKS proxy support. (Optional, with httpx[socks])
rich - Rich terminal support. (Optional, with httpx[cli])
click - Command line client support. (Optional, with httpx[cli])
brotli or brotlicffi - Decoding for "brotli" compressed responses. (Optional, with httpx[brotli])
zstandard - Decoding for "zstd" compressed responses. (Optional, with httpx[zstd])

A huge amount of credit is due to requests for the API layout that much of this work follows, as well as to urllib3 for plenty of design inspiration around the lower-level networking details.

HTTPX is BSD licensed code.
Designed & crafted with care.
— 🦋 —

httpx's People

Contributors

Stargazers

Watchers

Forkers

yeraydiazdiaz realsby matusf 00kai0 taoufik07 yunstanford ohbonsai didip updatatoday 249143979 gridl louisparis cansarigol spartakos87 harshanarayana jacobhjkim ambrozic kanglicheng stephenbrown2 themushrr00m eshuttleworth isaacbenghiat halbow nsidnev rikardo69 zzsf-com themrinalsinha hhy5277 andriyko shivlondon trendingtechnology zzz233 andrewmwhite rinti guilhermecf devenlu pquentin batermj vbsoftpl behrtam kznovo euri10 kumarprafful anubhavp28 yaerobi skshetry ox1d0 ptmcg justlund hugovk toppk jtmiclat niejn skyline6 julianocristian marnold05 ylmzfun theruziev tomasfarias ccf19881030 glqstrauss dobrovolsky gdhameeja rowillia hanaasagi jayh5 tsatsujnr139 spencerx hello-anmol mxmaher pgjones sonyeric di ojii saketh0007 shubhang889 akankshya77 assuncao123 jjbakz112 themix0808 nanthanat09401137 day2115 ampakseda taoyufeng-0502 hunter900 madkote ahhmatov graingert winsomexjs sadnessofatlantis rajatarora21 marias210 kevinswlo pereile lepture ba11b0y tssujt yilinxiong arnabsaha6 mwoss

httpx's Issues

Pull adapters into client class

I think we'll probably want to drop the existing adapters layering, and just have Client / Dispatcher. Refactoring away should be easy enough, given the annotations and coverage, so will probably just continue in this vein for the moment, but I think it'd be a better way around.

Proxy support

Best place to look would be urllib3.
Will need to be handled by the "dispatch" layer.

Add support for a Trio ConcurrencyBackend

Cookie persistence

We currently have adapters/cookies.py stubbed out for this.

A precursor to persistence should probably be adding a Cookies class and cookie interfaces on the request and response. Once we support sending and getting cookies back, then we can also add the cookie persistence layer.

Close all outstanding responses on client close

Client close currently closes off any keep-alive connections.
Actually the behavior would be more neatly bounded if we forcibly close off all connections.

with SyncClient() as client:
    response = client.get(..., stream=True)
response.read()  # This should raise an error.

Environment support

Needs to handle anything that requests currently deals with when trust_env=True is set.
.netrc, REQUEST_CA_BUNDLE and anything else relevant.

First step to dealing with this would be to outline exactly what set of stuff requests includes. If anyone wants to dig into this and comment on this issue comprehensively with what set of behavior we need, then that'd be a great start.

We have adapters/environment.py currently stubbed out to deal with adding in this behavior.

Combining efforts?

Hey, I just discovered this. Looks pretty cool!

I don't know if you're aware, but we've been working on a similar project for a while now:

Some key features:

Based on urllib3, rather than a rewrite, to take advantage of their years of work finding edge-cases. Supports tricky stuff like proxies and early responses.
Designed to support multiple sync and async backends (asyncio, trio, twisted, ...) within a single code base
Python 2 support (not sure how important this is – we might end up dropping it – but there are some important projects like botocore and pip that want to support async but won't be able to drop python 2 for a few years yet)
Working with the urllib3 devs to manage the transition: python-trio/hip#84

I'd love to hear more about what you're planning, and compare notes. Could do it here, or do a video chat on Monday, or perhaps we could meet up at PyCon this week?

AsyncClient(app=asgi_app) does not run startup/shutdown functions

This might be by design, but it seems like the client, when wrapping an ASGI app, does not trigger the startup/shutdown lifespan events or wait for their event handlers.

If the only thing the startup/shutdown handlers do is to connect/disconnect to the database, I might as well solve this with additional fixtures in my test setup. My concern is keeping the startup/shutdown handlers and the test setup in sync once the handlers grows larger.

If the lack of lifespan support is by design, I'm OK with that, but if so we should document it.

If this isn't by design, there's an implementation based upon requests_async.ASGISession here that looks quite adaptable to http3.AsyncClient.

Requirements:

http3==0.5.0
pytest
pytest-asyncio
starlette

Test case:

import asyncio

import http3

import pytest

from starlette.applications import Starlette
from starlette.responses import PlainTextResponse


startup_was_called = asyncio.Event()


@pytest.fixture
def app():
    app = Starlette(debug=True)

    @app.on_event("startup")
    def startup():
        startup_was_called.set()

    @app.route("/")
    async def homepage(request):
        await asyncio.sleep(1)
        return PlainTextResponse("Hello, world!")

    return app


@pytest.fixture
def http_client(app):
    return http3.AsyncClient(app=app)


@pytest.mark.asyncio
async def test_async_test_client_triggers_app_startup(http_client):
    result = await http_client.get("http://www.example.com/")

    assert result.text == "Hello, world!"
    assert startup_was_called.is_set()

Result:

____________________________________________ test_async_test_client_triggers_app_startup _____________________________________________

http_client = <http3.client.AsyncClient object at 0x7f11a35b5dd8>

    @pytest.mark.asyncio
    async def test_async_test_client_triggers_app_startup(http_client):
        result = await http_client.get("http://www.example.com/")
    
        assert result.text == "Hello, world!"
>       assert startup_was_called.is_set()
E       assert False
E        +  where False = <bound method Event.is_set of <asyncio.locks.Event object at 0x7f11a39a3dd8 [unset]>>()
E        +    where <bound method Event.is_set of <asyncio.locks.Event object at 0x7f11a39a3dd8 [unset]>> = <asyncio.locks.Event object at 0x7f11a39a3dd8 [unset]>.is_set

test_http3_async_test_client.py:40: AssertionError
================================================ 1 failed in 1.09 seconds =================================================

Redirects

I guess we want to support redirects at this level, but it's not 100% clear.

Add support for Google's Brotli Python implementation

This is handled by urllib3 and it'd be pretty trivial to lift from there.

Add `.raise_for_status()` to API

Following the same behavior as requests.

A good one for a contributor to jump on, since it's nicely bounded.

Allow use the async api without instantiate a client

Hi!

Would be more intuitive and more symmetric if the asynchronous api were accessible without instantiating the client.

e.g

import aiohttpx from httpx
r = await aiohttpx.get('https://www.example.com/')

Thank you for this wonderful project.

`Bad file descriptor` when use httpcore to request https site

I found the problem when using request_async to request a https site. After reading the code, I think it may a bug of httpcore

Env

Python 3.7.2
httpcore 0.2.0

Steps to reproduce the problem

import asyncio
import httpcore


async def f():
    ssl = httpcore.SSLConfig(cert=None, verify=False)
    http = httpcore.ConnectionPool(ssl=ssl)
    response = await http.request('GET', 'https://httpbin.org/get')
    print(response.body)


loop = asyncio.get_event_loop()
loop.set_debug(True)
loop.run_until_complete(f())
loop.close()

Above code will report

Fatal write error on socket transport
protocol: <asyncio.sslproto.SSLProtocol object at 0x7f145cd000b8>
transport: <_SelectorSocketTransport fd=6>
Traceback (most recent call last):
  File "/usr/lib/python3.7/asyncio/selector_events.py", line 857, in write
    n = self._sock.send(data)
OSError: [Errno 9] Bad file descriptor
Fatal error on SSL transport
protocol: <asyncio.sslproto.SSLProtocol object at 0x7f145cd000b8>
transport: <_SelectorSocketTransport closing fd=6>
Traceback (most recent call last):
  File "/usr/lib/python3.7/asyncio/selector_events.py", line 857, in write
    n = self._sock.send(data)
OSError: [Errno 9] Bad file descriptor

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.7/asyncio/sslproto.py", line 676, in _process_write_backlog
    self._transport.write(chunk)
  File "/usr/lib/python3.7/asyncio/selector_events.py", line 861, in write
    self._fatal_error(exc, 'Fatal write error on socket transport')
  File "/usr/lib/python3.7/asyncio/selector_events.py", line 670, in _fatal_error
    self._force_close(exc)
  File "/usr/lib/python3.7/asyncio/selector_events.py", line 682, in _force_close
    self._loop.call_soon(self._call_connection_lost, exc)
  File "/usr/lib/python3.7/asyncio/base_events.py", line 688, in call_soon
    self._check_closed()
  File "/usr/lib/python3.7/asyncio/base_events.py", line 480, in _check_closed
    raise RuntimeError('Event loop is closed')
RuntimeError: Event loop is closed

It seems that the socket has already been closed when sending SSL close_notify message. And when I use python-dbg to run the code, there are some warning

sys:1: ResourceWarning: unclosed <socket.socket fd=6, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('10.0.2.15', 59830), raddr=('
34.238.32.178', 443)>
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/lib/python3.7/asyncio/sslproto.py:322: ResourceWarning: unclosed transport <asyncio.sslproto._SSLProtocolTransport object at 0x7f548e936c88>
  source=self)

I think when move out coroutine context scope, the socket will auto closed through GC. But _SSLProtocolTransport try to send some bytes(__del__ called after socket closed)

https://github.com/python/cpython/blob/d246a6766b9d8cc625112906299c4cb019944300/Lib/asyncio/sslproto.py#L319

    def close(self):
        """Close the transport.
        Buffered data will be flushed asynchronously.  No more data
        will be received.  After all buffered data is flushed, the
        protocol's connection_lost() method will (eventually) called
        with None as its argument.
        """
        self._closed = True
        self._ssl_protocol._start_shutdown()

    def __del__(self, _warn=warnings.warn):
        if not self._closed:
            _warn(f"unclosed transport {self!r}", ResourceWarning, source=self)
        self.close()

But it looks strange, _SSLProtocolTransport should still have a reference to the socket, it will not called after socket. I really don't know why, please help me.

Keep-Alive timeouts

We should automatically close any Keep-Alive connections after a timeout period.

Design Multipart helper to stream by default

This is a frequent problem with urllib3 in that the default behavior loads all of the file's data into memory at once before sending resulting in a huge files getting loaded into memory.

It's a hard problem to fix in urllib3 because of how the API was designed, but we can still fix our implementation not to do this. I've got part of a patch for this currently.

HTTP/2 Upgrade support

Currently we only support HTTP/2 on https connections.

Establishing HTTP/2 on http works a little differently in that it starts with a regular HTTP/1.1 connection, and includes an Upgrade header. If the server responds with an appropriate upgrade response, then we switch to HTTP/2.

This would require a little bit of work, since we'd need the HTTP11Connection class to be able to call back into the Connection in order to indicate that an upgrade has been performed.

Add WSGI and ASGI dispatchers

We should allow clients to trivially plug into WSGI and ASGI apps.
This will allow:

Using httpcore as a test client.
Stubbing out third-party-services during tests, rather than making live network requests.

We should support the following:

def hello_world(environ, start_response):
    ...

client = Client(app=hello_world)

And...

async def hello_world(scope, receive, send):
    ...

client = Client(app=hello_world)

This is easy enough for us to do, since we've already got a dispatch=... interface, so we'll want something like this:

if app is not None:
    if is_coroutine(app):
        dispatch = ASGIDispatcher(app)
    else:
        dispatch = WSGIDispatcher(app)

We'll then need ASGIDispatcher and WSGIDispatcher implementations, which implement Dispatcher.send().

API design question - `Response.url`

Currently our Response.url attribute exposes a URL instance.

This is a breaking change from the requests API where it just exposes a plain string.

It's feasible that we should instead only be exposing plain string URLs, in order to aim for drop-in replacement API compatibility w/ requests, and in order to keep the API surface area low.

Options here are:

Expose request.url as a URL instance. (Richer information, URL class is also useful in its own right.)
Expose request.url as a str. (Better API compat. Lower API surface area to maintain.)
Expose request.url as a str, and request.urlinfo as a URL instance. (Better API compat. High API surface area.)

Timeout tests

DataField.render_data fails when value is not string

DataField only likes str objects as values. DataField.render_data will produce an error when ie. int or bytes objects are part of posted data or form as well as files.

  File ".../http3/multipart.py", line 29, in render_data
    return self.value.encode("utf-8")
AttributeError: 'int' object has no attribute 'encode'

Test to reproduce

@pytest.mark.parametrize(("value"), (123, b"abc"))
def test_multipart_non_str(value):
    client = Client(dispatch=MockDispatch())
    data = {"text": value}
    files = {"file": io.BytesIO(b"<file content>")}
    response = client.post("http://127.0.0.1:8000/", data=data, files=files)
    assert response.status_code == 200

or code

import http3, io

http3.post(
    url="https://httpbin.org/post",
    data={"a": b"1"},
    files={"file": io.BytesIO(b"<file content>")},
)

100% Test coverage

Let's get the test coverage up to 100%, and then force-pin it.

Any contributions towards this are welcome.

Add `params=...` to API.

All of our .requests(), .get(), .put() etc... APIs should accept a params dict, and should modify the URL if it is provided.

In fact it'd actually make sense for the first pass on this to be adding params=... as an optional argument to the URL class.

This ticket would be a good one for a contributor to jump onto, since it's nicely bounded. Start with a PR only for adding params to the URL class. Then once that's in, we can consider a follow-up PR for adding it to the API.

Implementations should double-check requests' handling of params (eg. what exactly does it do with non-string values?).

My preference here would probably be that JSON-able scalar values are supported, and should map onto JSON equivalents. So: none, true, false, ints, floats, str.

Sync or Async first

We might well want to switch things around, and have httpcore.Client be the standard threaded case. This'd particularly make sense if we include #50 as part of the functionality that gets built out.

Disable TLSv1 and TLSv1.1 by default

TLSv1.2 was released over 10 years ago and is widely deployed.

The Alexa Top 1 Million Analysis from February 2018 shows that for the sites surveyed, the vast majority support TLSv1.2 (98.9 percent), with a mere 0.8 percent using TLSv1.0 and an even smaller percentage using TLSv1.1.

Quote from IETF Draft on Deprecating TLSv1 and TLSv1.1 from 2018

We may be able to tighten up the ciphers we're enabling by default as well as a result of this change.

Breaking changes release in version 0.3.1

The requests-async library depends on httpcore 0.3.*, which is now failing for me when running httpcore 0.3.1—specifically this line with request() got an unexpected keyword argument 'ssl'

@tomchristie I'm guessing that version 0.4.0 of httpcore was accidentally released under version tag 0.3.1?

Add `data=...` to API.

An initial pass here would probably be to only support URL encoded data initially.

We probably want to change things around so that our current content argument supporting bytes or async bytes iterator becomes data instead, and supports bytes, or async bytes iterator, or dict.

A later follow up to this will probably be also supporting str as an option, and encoding to Content-Type: text/plain; charset=utf-8. (But let's keep that as a separate follow-on to this issue)

Try connecting to HTTPS on schemaless URLs

urllib3's behavior with schemaless URLs was to connect via http. Maybe we can have some logic that if we have verify=True active and we receive a schemaless URL either from a user or a redirect we try https first before falling back on http? Would have to be implemented within the connection retry logic.

Blockers for 1.0 release.

List of things that I currently think are required for a 1.0 release.

We should try to keep this scope as narrow as possible. Anything related to the external API is fair game, plus any critical level implementation issues.

Anything else that doesn't meet those criteria should be regarded as not strictly required for 1.0.

Am I missing anything else that strictly needs to get onto this list?

Implement http_version attribute on BaseResponse object

Currently, I am utilizing requests, which under the hood uses urllib3, whose HTTPResponse object contains a .version attribute, which is an int, ten times greater than the actual HTTP version (i.e. 10 for 1.0, 11 for 1.1, etc).
I currently use it for logging my requests and responses in debug mode.

It would be good I think to implement a similar attribute (either version or more specifically http_version) for the BaseResponse object, for similar purposes, but perhaps use an actual float. What are your thoughts?

Proposal:
BaseResponse.http_version: float = 0.0 # Could be a floatEnum with known HTTP versions

HTTP/2 connection implementation is currently flaky.

Is it me ?
Or the simplest test fails:

 httpcore.Client().get("https://www.google.com")

outputs:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/httpcore/concurrency.py", line 65, in read
    self.stream_reader.read(n), timeout.read_timeout
  File "/usr/lib/python3.7/asyncio/tasks.py", line 423, in wait_for
    raise futures.TimeoutError()
concurrent.futures._base.TimeoutError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/httpcore/dispatch/connection_pool.py", line 114, in send
    request, stream=stream, ssl=ssl, timeout=timeout
  File "/usr/local/lib/python3.7/dist-packages/httpcore/dispatch/connection.py", line 53, in send
    request, stream=stream, timeout=timeout
  File "/usr/local/lib/python3.7/dist-packages/httpcore/dispatch/http2.py", line 77, in send
    await response.read()
  File "/usr/local/lib/python3.7/dist-packages/httpcore/models.py", line 684, in read
    self._content = b"".join([part async for part in self.stream()])
  File "/usr/local/lib/python3.7/dist-packages/httpcore/models.py", line 684, in <listcomp>
    self._content = b"".join([part async for part in self.stream()])
  File "/usr/local/lib/python3.7/dist-packages/httpcore/models.py", line 695, in stream
    async for chunk in self.raw():
  File "/usr/local/lib/python3.7/dist-packages/httpcore/models.py", line 712, in raw
    async for part in self._raw_stream:
  File "/usr/local/lib/python3.7/dist-packages/httpcore/dispatch/http2.py", line 121, in body_iter
    event = await self.receive_event(stream_id, timeout)
  File "/usr/local/lib/python3.7/dist-packages/httpcore/dispatch/http2.py", line 131, in receive_event
    data = await self.reader.read(self.READ_NUM_BYTES, timeout)
  File "/usr/local/lib/python3.7/dist-packages/httpcore/concurrency.py", line 68, in read
    raise ReadTimeout()
httpcore.exceptions.ReadTimeout

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/manatlan/Bureau/reqman/reqman.py", line 1414, in <module>
    httpcore.Client().get("https://www.google.com")
  File "/usr/local/lib/python3.7/dist-packages/httpcore/client.py", line 496, in get
    timeout=timeout,
  File "/usr/local/lib/python3.7/dist-packages/httpcore/client.py", line 472, in request
    timeout=timeout,
  File "/usr/local/lib/python3.7/dist-packages/httpcore/client.py", line 665, in send
    timeout=timeout,
  File "/usr/lib/python3.7/asyncio/base_events.py", line 573, in run_until_complete
    return future.result()
  File "/usr/local/lib/python3.7/dist-packages/httpcore/client.py", line 287, in send
    allow_redirects=allow_redirects,
  File "/usr/local/lib/python3.7/dist-packages/httpcore/client.py", line 313, in send_handling_redirects
    request, stream=stream, ssl=ssl, timeout=timeout
  File "/usr/local/lib/python3.7/dist-packages/httpcore/dispatch/connection_pool.py", line 117, in send
    self.active_connections.remove(connection)
  File "/usr/local/lib/python3.7/dist-packages/httpcore/dispatch/connection_pool.py", line 64, in remove
    del self.all[connection]
KeyError: <httpcore.dispatch.connection.HTTPConnection object at 0x7fadbf88e0f0>

btw, I can't catch the Exception ?!

try:
    httpcore.Client().get("https://www.google.com")
except concurrent.futures._base.TimeoutError:
    print(1)
except httpcore.exceptions.ReadTimeout:
    print(2)

Add base_url to Client

Client should support an optional base_url. If provided then .get() etc should accept relative URLs such as “/homepage”, which should automatically be joined against the base url.

Community discussion

Although I've not yet done much in the way of documentation or branding this pacakge up, it's quickly narrowing in on a slice of functionality that's lacking in the Python HTTP landscape.

Specifically:

A requests-compatible API.
With HTTP/2 and HTTP/1.1 support (and an eye on HTTP/3)
A standard sync client by default, but the option of an async client, if you need it.
An API for making requests in parallel. The standard case has a regular sync API on the front, but using async under the hood. Also an equivelent for the fully async case. (Both draws on Trio a little for lessons on how to manage branching, exceptions & cancellations.)
Ability to plug directly into WSGI or ASGI apps, instead of network dispatch. (For usage as a test client, or implementing stub services during test or CI)
Ability to plug into alternate concurrency backends.

I've also some thoughts on allowing the Request and Response models to provide werkzeug-like interfaces, allowing them to be used either client-side or server-side. One of the killer-apps of the new async+HTTP/2 functionality is allowing high-throughput proxy and gateway services to be easily built in Python. Having a "requests"-like package that can also use the models on the server side is something I may want to explore once all the other functionality is sufficiently nailed down.

Since "requests" is an essential & neccessary part of the Python ecosystem, and since this package is aiming to be the next steps on from that, I think it's worth opening up a bit of community discussion here, even if it's early days.

I'd originally started out expecting httpcore to be a silent-partner dependency of any requests3 package, but it progressed fairly quickly from there into "actually I've got a good handle on this, I think I need to implement this all the way through". My biggest questions now are around what's going to be the most valuable ways to deliver this work to the commnunity.

Ownership, Funding & Maintainence

Given how critical a requests-like HTTP client is to the Python ecosystem as a whole I'd be ammenable to community discussions around ownership & funding options.

I guess that I need to out by documenting & pitching this package in it's own right, releasing it under the same banner and model at all the other Encode work, and then take things from there if and when it starts to gain any adoption.

I'm open to ideas from the urllib3 or requests teams, if there's alternatives that need to be explored early on.

Requests

The functionality that this pacakge is homing in on meets the requirements for the proposed "Requests III". Perhaps there's something to be explored there, if the requests team is interested, and if we can find a good community-focused arrangement around funding & ownership.

urllib3

The urllib3 team obvs. have a vast stack of real-world usage expertise that'd be important for us to make use of. There's bits of work that urllib3 does, that httpcore likely needs to do, including fuzziness around how content decoding actually ends up looking on the the real, messy web. Or, for example, pulling early responses before necessarily having fully sent the outgoing request.

Something else that could well be valuable would be implementing a urllib3 dispatch class alongside the existing h11/h2/async dispatch. Any urllib3 dispatch class would still be built on top of the underlying async structure, but would dispatch the urllib3 calls within a threadpool.

Doing so would allow a couple of useful things, such as being able to isolate behavioral differences between the two implementations, or perhaps allowing a more gradual switchover for critical services that need to take a cautious approach to upgrading to a new HTTP client implementation.

Trio, Curio

I think httpcore as currently delivered makes it fairly easy to deliver a trio-based concurrency backend. It's unclear to me if supporting that in the package itself is a good balance, or if it would be more maintainable to ensure that the trio team have have the interfaces they need, but that any implementation there would live within their ecosystem.

(I'd probably tend towards the later case there.)

Twisted

I guess that an HTTP/2 client would probably be useful to the Twisted team. I don't really know enough about Twisted's style of concurrency API to take a call on if there's work here that could end up being valuable to them.

HTTP/3

It'll be worth us keeping an eye on https://github.com/aiortc/aioquic

Having a QUIC implementation isn't the only thing that we'd need in order to add HTTP/3 support, but it is a really big first step.

We currently have connect/reader/writer interfaces. If we added QUIC support then we'd want our protocol interfaces to additionally support operations like "give me a new stream", and "set the flow control", "set the priority level".

For standard TCP-based HTTP/2 connections, "give me a new stream" would always just return the existing reader/writer pair. For QUIC connections it'd return a new reader/writer pair for a protocol-level stream.

This is getting way ahead of ourselves, but I think we've probably got a good basis here to be able to later support HTTP/3.

One big blocker would probably be whatever HTTP-level changes are required between HTTP/2 and HTTP/3 The diffs between QPACK vs HPACK is one cases here, but there's likely also differences given that the stream framing in HTTP/2 is at the HTTP-level, wheras the stream framing in HTTP/3 is at the transport-level.

It's unclear to me if these differences are sufficiently incremental that they could fall into the scope of a future hyper/h2 package or not, or what the division of responsibilities would look like.

One important point to draw out here is that the growing complexities from HTTP/1.1, to HTTP/2, to HTTP/3, mean that the Python community is absolutely going to need to need to tackle work in this space as a team effort - the layers in the stack need expertise in various differing areas.

Certificates

Right now we've using certifi for certificate checking. Christian Heimes has been doing some work in this space around accessing interfaces to the Operating System's certificate store. I might try to collar him at PyLondinium.

Any other feedback?

I'm aware that much of this might look like it's a bit premature, but the work is pretty progressed, even if I've not yet statrted focusing on any branding and documentation around it.

Are there other invested areas of the Python community that I'm not yet considering here?

Where are the urllib3, trio, requests, aiohttp teams heading in their own work in this space? Is there good scope for collaboration, and how do you think that could/should work?

What else am I missing?

Multipart helper

The first part of this would be a class that just implements taking a files/data dict, and returning the byte-wise multipart content, plus the Content-Type header.

Places to start with this include urllib3, and the requests-toolbelt streaming multipart encoder.

Wait for connection close

Sync support for .raw() doesn't work just yet because the transport close is implemented as as a "call_soon", and we're currently not using transport.wait_closed, meaning that the asyncio loop ends up being closed before the pending task completes.

Use `HTTPStatus` instead of our own int enum.

https://docs.python.org/3/library/http.html#http.HTTPStatus

TypeErrors against public interfaces.

Partly prompted by this comment.

I think we should probably be a bit strict about enforcing types passed to our public interfaces. It'd be nice to raise loud clear errors, rather than failing with some arbitrary AttributeError in the middle of the codebase if a user passes something completely invalid to the interface. This would be stricter than what most Python libraries do, but I think it might be a good trade-off for us, to have a very strict and clearly demarked API.

That'd be easy enough to do since there's not too many points we'd need to enforce.
The big obvious one would be checking everything in AsyncClient.request(), which most of the rest of the public API calls into.

The would be some other bits & pieces to check too eg. Enforce that streaming data iterates through bytes/str, and not any other types. Enforce that dispatchers really do return a response type. Type checking on the various models that we expose.

Perhaps I'm being over-zelaous here? Any thoughts?

Remove asyncio usage within SSLConfig

Should have to pass in the backend to call run_in_threadpool() to make it agnostic to backend being used.

Handle early responses

Ideally our HTTP/1.1 and HTTP/2 cases should be able to handle pulling responses before they've finished sending the entire request.

Some server implementations could deadlock on large request bodies if we strictly require that the entire request has been sent before reading the response.

Event loop gets stuck awaiting the request after last change or requests-async/http3

I've discovered an interesting behaviour recently.
The matter is that the code below starts getting stuck at awaiting the request since the bump of request-async to 0.5.0 (0.4.1 works well), but the only meaningful change in encode/requests-async#48 is bumping of httpcore/http3 to 0.3.0 (if I'm not mistaken). That's why I decided to open an issue here.

I'm wondering if it could be possible to clarify if I'm missing something.
Happy to provide any assistance with investigation/debugging/fixing.

The code example:

import asyncio
import uvloop
import requests_async

myurls = [
    "https://google.com",
    "https://google.com",
]

async def async_get(url):
    resp = await requests_async.get(url)
    return resp.status_code

async def gather_run():
    tasks = [async_get(x) for x in myurls]
    # Await a coroutine
    result_1 = await tasks[0]
    # Or gather them all
    results = await asyncio.gather(*tasks, return_exceptions=True)
    print(f'Results: {results}')
    return results

uvloop.install()
asyncio.run(gather_run())

Note: it behaves same without uvloop.

Check connection aliveness on acquiry from pool

Aside: Probably a whole bunch of robustness testing would be good, for lost-connection cases in general.

Failing to detect lost connections (asyncio only)

1 of 5 requests I initiated would raise ConnectionResetError('Connection lost') when calling AsyncClient from inside Starlette and Uvicorn.

In comparison, urllib, requests and Tornado's AsyncClient works as expected.

Stacktrace:

  File "/project/venv/lib/python3.7/site-packages/http3/client.py", line 145, in send_handling_redirects
    request, verify=verify, cert=cert, timeout=timeout
  File "/project/venv/lib/python3.7/site-packages/http3/dispatch/connection_pool.py", line 121, in send
    raise exc
  File "/project/venv/lib/python3.7/site-packages/http3/dispatch/connection_pool.py", line 116, in send
    request, verify=verify, cert=cert, timeout=timeout
  File "/project/venv/lib/python3.7/site-packages/http3/dispatch/connection.py", line 59, in send
    response = await self.h11_connection.send(request, timeout=timeout)
  File "/project/venv/lib/python3.7/site-packages/http3/dispatch/http11.py", line 53, in send
    await self._send_event(event, timeout)
  File "/project/venv/lib/python3.7/site-packages/http3/dispatch/http11.py", line 101, in _send_event
    await self.writer.write(data, timeout)
  File "/project/venv/lib/python3.7/site-packages/http3/concurrency.py", line 89, in write
    self.stream_writer.drain(), timeout.write_timeout
  File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/asyncio/tasks.py", line 388, in wait_for
    return await fut
  File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/asyncio/streams.py", line 348, in drain
    await self._protocol._drain_helper()
  File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/asyncio/streams.py", line 202, in _drain_helper
    raise ConnectionResetError('Connection lost')

Parallel requests

We could consider adding a concurrency API to simplify making multiple requests in parallel.

I'll give sync examples here, but we'd also have equivelent async cases too.

If you have a number of requests that you'd like to send in paralell, then...

with client.parallel() as parallel:
    homepage = parallel.get("http://example.com/homepage")
    another = parallel.get("http://example.com/another")
    response = homepage.get_response()
    response = another.get_response()

Alternatively, if you don't need to care about getting the responses back out-of-order, then:

with client.parallel() as parallel:
    for page_number in range(0, 10):
        parallel.get(f"http://example.com/{page_number}")
    while parallel.pending:
        response = parallel.next_response()

Nice things here:

Bring the goodness of async's lightweight parallelization to standard threaded code.
For the async case, users don't need to touch lower-level flow-forking primitives. asyncio.gather or whatevs.

Auth support

We currently have adapters/authentication.py stubbed out for this.

The client will need to create an AuthConfig which should be bound to the origin of the initial request. The auth should be included during prepare_request, if the origin matches.

Text streaming

Currently all of the streaming APIs we're providing are byte-wise. We ought to also support text-decoding interfaces.

This would likely mean that we'd want stream(text=True). Difficulties here are that we don't have .apparent_encoding available to us prior to the content being available. We'd need to use chardet's incremental decoding as well as an incremental decoder

I'd suggest we'd want to implement this by adding a TextDecoder(encoding=None) class to decoders.py that takes the charset_encoding as it's initial argument. It would implement the same interface as the existing decoders, except returning str.

Aside: It's possible that in the future we might start to move more into line with requests' iter_content() and iter_lines().

Tighten up `.next()` behavior

Currently .next is set as an attribute on the response.

Really it'd be good to have:

An explicit async def next(self): method
A .call_next attribute that the RedirectAdapter can set.

This'll mean we can have more neatly guarded behavior...

async def call_next(self):
    if not self.is_redirect:
        raise NotRedirectResponse(...)
    if self.call_next is None:
        raise CallNextNotSet("")
    return await self.call_next()

h11._util.RemoteProtocolError: can't handle event type ConnectionClosed when role=SERVER and state=SEND_RESPONSE

I intermittently got this error when load testing uvicorn endpoint.

This error comes from a proxy endpoint where I am also using encode/http3 to perform HTTP client calls.

  File "/project/venv/lib/python3.7/site-packages/http3/client.py", line 365, in post
    timeout=timeout,
  File "/project/venv/lib/python3.7/site-packages/http3/client.py", line 497, in request
    timeout=timeout,
  File "/project/venv/lib/python3.7/site-packages/http3/client.py", line 112, in send
    allow_redirects=allow_redirects,
  File "/project/venv/lib/python3.7/site-packages/http3/client.py", line 145, in send_handling_redirects
    request, verify=verify, cert=cert, timeout=timeout
  File "/project/venv/lib/python3.7/site-packages/http3/dispatch/connection_pool.py", line 121, in send
    raise exc
  File "/project/venv/lib/python3.7/site-packages/http3/dispatch/connection_pool.py", line 116, in send
    request, verify=verify, cert=cert, timeout=timeout
  File "/project/venv/lib/python3.7/site-packages/http3/dispatch/connection.py", line 59, in send
    response = await self.h11_connection.send(request, timeout=timeout)
  File "/project/venv/lib/python3.7/site-packages/http3/dispatch/http11.py", line 65, in send
    event = await self._receive_event(timeout)
  File "/project/venv/lib/python3.7/site-packages/http3/dispatch/http11.py", line 109, in _receive_event
    event = self.h11_state.next_event()
  File "/project/venv/lib/python3.7/site-packages/h11/_connection.py", line 439, in next_event
    exc._reraise_as_remote_protocol_error()
  File "/project/venv/lib/python3.7/site-packages/h11/_util.py", line 72, in _reraise_as_remote_protocol_error
    raise self
  File "/project/venv/lib/python3.7/site-packages/h11/_connection.py", line 422, in next_event
    self._process_event(self.their_role, event)
  File "/project/venv/lib/python3.7/site-packages/h11/_connection.py", line 238, in _process_event
    self._cstate.process_event(role, type(event), server_switch_event)
  File "/project/venv/lib/python3.7/site-packages/h11/_state.py", line 238, in process_event
    self._fire_event_triggered_transitions(role, event_type)
  File "/project/venv/lib/python3.7/site-packages/h11/_state.py", line 253, in _fire_event_triggered_transitions
    .format(event_type.__name__, role, self.states[role]))
h11._util.RemoteProtocolError: can't handle event type ConnectionClosed when role=SERVER and state=SEND_RESPONSE

Exceptions for decoding errors

If Gzip, Brotili or Deflate decoding fails it should raise a specific exception type.

Implement http+unix dispatcher

Lots of users for Docker, Sentry, etc use the http+unix://... URLs. Might necessitate additional backend interface if we intend this to be first-class support.

Finesse json

Detect encoding from initial bytes, rather than just using "utf-8".
Allow additional arguments to response.json() and review any subtle API diffs to requests.

Retry requests

urllib3 has a very convenient Retry utility (docs) that I have found to be quite useful when dealing with flaky APIs. http3's Clients don't support this sort of thing yet, but I would love it if they did!

In the meantime, I can probably work out my own with a while loop checking the response code.