Comments (4)
Adding some more context after further digging. The reason this isn't more widely impacting is this will only affect code subclassing HTTPConnection in urllib3. In urllib3<2.2, headers were always left in their provided representation and a copy was created.
botocore has a ~10 year old usage of HTTPConnection
where request
is overwritten to handle HTTP 100-continue responses. This was the earlier sticking point for upgrading botocore to urllib3 2.0.
With the recent change in 4ece59b, the type being passed to request
is now different, leading to code accessing headers failing with the above error. This is avoided in the common case because urllib3's request
implementation uses items()
when iterating headers and forwards any bytes values to httplib's putheader
for handling. items()
appears to be one of the only interfaces on HTTPHeaderDict
that behaves when bytes are provided.
While HTTPConnection isn't explicitly part of the public interface in 2.x, it appears to have been when botocore's code was written. Shazow created the separate connection.py module for this use case in #254 at the request of a similar use case in #253.
I don't know what the current maintenance team wants to do in this case or what the current expectations are around this behavior. I think the options I'm seeing currently are:
- Revert the type change back to
.copy()
to preserve the previous functionality. - Correct the behavior in
HTTPHeaderDict
when bytes values are provided. I think it can be argued the current inconsistencies aren't expected and might be fixed/improved regardless. - Attempt to update the workflow in botocore to remove the usage of bytes. It's not immediately clear what the scope of impact would be there since there are other widely used libraries that have some reliance on the current behavior.
from urllib3.
There is already an existing ticket + PR to accept bytes values in the HTTPHeaderDict:
Would be interesting to see if that PR on its own would fix this issue.
from urllib3.
Thanks for pointing those out, @bblommers! Yeah, I agree those seem to be inline with this issue. Anecdotally from Requests, I'll say guessing with the encoding is usually the wrong decision (especially around latin-1). If we get bytes, it should probably keep going over the wire as it was received. Otherwise you end up with data corruption between latin-1 and utf-8.
It looks like Quentin already covered some of that in #3279. I agree that having mixed values for the same key would be an exception case (although perhaps it could be explicitly handled rather than failing on the join):
So my current thinking is that we should indeed join byte values using b", " and forbid mixed encoding and bytes (to be able to join). And not try to decode bytes to latin-1.
Edit: I can confirm #3279 does appear to resolve this issue when rebased onto main.
from urllib3.
Thanks for the investigation!
We need to review and merge #3279, but we should also consider if we really want to cast headers to HTTPHeaderDict.
from urllib3.
Related Issues (20)
- Add support for streaming HTTP/2 responses
- Allow HTTP/2 connection and socket to be re-used for future requests
- Add support for sending a request body with HTTP/2 HOT 1
- Add support for using HTTP/2 without TLS or prior knowledge HOT 1
- pypy tests often fail in CI with reason "cancelled after 30 minutes"
- Comply with TLS settings mandated for HTTP/2 in RFC 9113
- Upgrade mypy to the latest version in CI HOT 1
- Fix type checking when Zstandard is installed
- Move length_remaining into BaseHTTPResponse HOT 1
- Slow test cases on pypy3.9-7.3.15 on Ubuntu 22.04
- TLS 1.3 Post Handshake Auth no longer working with urllib 2.1.0 when ignoring cert validation HOT 1
- Create a workflow (nox?) for testing Emscripten support locally
- Emscripten support emits an InsecureRequestWarning even when using HTTPS
- Path toward testing with a released version of hypercorn? HOT 3
- Handle HTTP/2 informational responses (1xx) HOT 2
- Streaming responses using urllib3 HOT 5
- verbose logging output
- Excess leading path separators causes ConnectionPool.urlopen to parse URL as host & port HOT 1
- ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')) HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from urllib3.