Giter Club home page Giter Club logo

Comments (4)

nateprewitt avatar nateprewitt commented on June 19, 2024

Adding some more context after further digging. The reason this isn't more widely impacting is this will only affect code subclassing HTTPConnection in urllib3. In urllib3<2.2, headers were always left in their provided representation and a copy was created.

botocore has a ~10 year old usage of HTTPConnection where request is overwritten to handle HTTP 100-continue responses. This was the earlier sticking point for upgrading botocore to urllib3 2.0.

With the recent change in 4ece59b, the type being passed to request is now different, leading to code accessing headers failing with the above error. This is avoided in the common case because urllib3's request implementation uses items() when iterating headers and forwards any bytes values to httplib's putheader for handling. items() appears to be one of the only interfaces on HTTPHeaderDict that behaves when bytes are provided.

While HTTPConnection isn't explicitly part of the public interface in 2.x, it appears to have been when botocore's code was written. Shazow created the separate connection.py module for this use case in #254 at the request of a similar use case in #253.

I don't know what the current maintenance team wants to do in this case or what the current expectations are around this behavior. I think the options I'm seeing currently are:

  1. Revert the type change back to .copy() to preserve the previous functionality.
  2. Correct the behavior in HTTPHeaderDict when bytes values are provided. I think it can be argued the current inconsistencies aren't expected and might be fixed/improved regardless.
  3. Attempt to update the workflow in botocore to remove the usage of bytes. It's not immediately clear what the scope of impact would be there since there are other widely used libraries that have some reliance on the current behavior.

from urllib3.

bblommers avatar bblommers commented on June 19, 2024

There is already an existing ticket + PR to accept bytes values in the HTTPHeaderDict:

#3072
#3279

Would be interesting to see if that PR on its own would fix this issue.

from urllib3.

nateprewitt avatar nateprewitt commented on June 19, 2024

Thanks for pointing those out, @bblommers! Yeah, I agree those seem to be inline with this issue. Anecdotally from Requests, I'll say guessing with the encoding is usually the wrong decision (especially around latin-1). If we get bytes, it should probably keep going over the wire as it was received. Otherwise you end up with data corruption between latin-1 and utf-8.

It looks like Quentin already covered some of that in #3279. I agree that having mixed values for the same key would be an exception case (although perhaps it could be explicitly handled rather than failing on the join):

So my current thinking is that we should indeed join byte values using b", " and forbid mixed encoding and bytes (to be able to join). And not try to decode bytes to latin-1.

Edit: I can confirm #3279 does appear to resolve this issue when rebased onto main.

from urllib3.

pquentin avatar pquentin commented on June 19, 2024

Thanks for the investigation!

We need to review and merge #3279, but we should also consider if we really want to cast headers to HTTPHeaderDict.

from urllib3.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.