Giter Club home page Giter Club logo

Comments (6)

sbleon avatar sbleon commented on August 18, 2024

Sounds like a good addition! Go for it!

On Mon, Apr 20, 2015 at 12:45 PM, Roberto Quintanilla <
[email protected]> wrote:

Hi! I'm currently working on a website with a lot of traffic in Latin
America.

We've found some agents that publish the "User-Agent" header with invalid
characters, making our app crash. We rolled up a fix on our app to
re-process this header with the corrected string, before noticing this
library.

I'm reviewing, however, that in your middleware class, your'e not
sanitizing the User-Agent header / HTTP_USER_AGENT env key

SANITIZE_ENV_KEYS = [

.

It's that on purpose? I'm preparing a PR in case it's desired.

Reply to this email directly or view it on GitHub
#23.

from utf8-cleaner.

vovimayhem avatar vovimayhem commented on August 18, 2024

I'm having trouble right now with two use cases. In both, we're noticing the '�' character in the string, before our app crashes when doing something useful with the string:

Use Case 1: Whenever I get this User-Agent string from god-knows-which cellphone model:

require 'json'

# This is how I've managed to replicate the User-Agent string this agent is sending to our servers:
problem_user_agent = "Mozilla/5.0 (Linux; U; Android 2.3.6; es-co; XT320 Build/GRK39F) AppleWebKit/533.1 (KHTML, like Gecko) Versi\xF3n/4.0 Mobile Safari/533.1".force_encoding(Encoding::UTF_8) # Notice the '\xF3' part, should read "Versión"

# Watch this fail... the same happens whenever I try to add this string to our database:
JSON.generate(user_agent: problem_user_agent)

Use Case 2: Whenever the agent sends unescaped URL query strings:

require "json"

# This is how I've managed to replicate the query string this agent is sending to our servers:
problem_url = "http://www.example.com/search?terms=M\xE9xico".force_encoding(Encoding::UTF_8) # should read "México"

# Again watch this fail...
JSON.generate(url: problem_url)

The best way I came up with to process these strings AND preserve the data that was originally meant is by using ActiveSupport core extensions for the String class, like this:

require "json"
require "active_support/core_ext/string/multibyte"

problem_user_agent = "Mozilla/5.0 (Linux; U; Android 2.3.6; es-co; XT320 Build/GRK39F) AppleWebKit/533.1 (KHTML, like Gecko) Versi\xF3n/4.0 Mobile Safari/533.1".force_encoding(Encoding::UTF_8)

processed_user_agent = problem_user_agent.mb_chars.tidy_bytes.to_s

# Now watch it work:
JSON.generate(user_agent: processed_user_agent)

It makes sense for me to do this before the processing utf8-cleaner does. But there are 2 issues if this functionality is to be included in this gem:

  • Will require activesupport as a dependency
  • The performance of the 'tidy_bytes' method implementation in activesupport version < 4.1 is slow compared to >= 4.1, but only if using ruby >= 2.1

I don't know at this point if it's useful for users of this gem to include this (if so, I can PR), but in the meantime I'll just create a new Rack Middleware class to process the data and then let the utf8-cleaner to do it's stuff.

from utf8-cleaner.

vovimayhem avatar vovimayhem commented on August 18, 2024

Gist with the Rack Middleware I generated: https://gist.github.com/vovimayhem/f7bc4b8f06fefa798374

from utf8-cleaner.

sbleon avatar sbleon commented on August 18, 2024

I didn't know about tidy_bytes. That is awesome!

It looks like you're tackling a couple of different problems:

  1. Fixing non-ASCII characters in some common non-UTF8 encodings.
  2. URI-encoding inputs that should have been encoded by the user agent but were not.

I'd like to see both of these incorporated into this gem, with suitable test cases, and probably in separate PRs.

from utf8-cleaner.

sbleon avatar sbleon commented on August 18, 2024

@vovimayhem I believe this addressed in the new version v0.1.1. Thanks for the suggestion about tidy_bytes!

from utf8-cleaner.

007lva avatar 007lva commented on August 18, 2024

Great!, I go to update because I have the same problem. Thanks!

from utf8-cleaner.

Related Issues (18)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.