Giter Club home page Giter Club logo

Comments (10)

SadieCat avatar SadieCat commented on September 25, 2024 3

The UTF8ONLY token only exists to let clients detect that the server is UTF-8 only. It is backwards compatible with the existing situation where servers that require UTF-8 silently break with clients which are not configured to use UTF-8.

The spec does not specify any required method for handling clients that send non-UTF-8. It's entirely legal under the spec for implementations to transcode any non-UTF-8 to UTF-8 if they want.

from ircv3-specifications.

DanielOaks avatar DanielOaks commented on September 25, 2024 2

since clients that do not support the specification will happily send non-UTF8 and be disconnected for violating the protocol.

Ideally such servers would always handle these cases without disconnecting the client. However, given the amount of discussion that'd likely result from trying to specify one specific way of handling these cases, I thought it'd be best to just let the servers handle it in whatever way they find appropriate.

To be backwards-compatible, this should be opt-in with a CAP exchange. Once a client has ACK'd UTF8ONLY, it is reasonable to expect it not to send anything that violates the UTF8ONLY specification.

Unfortunately we can't make this opt-in with a CAP, since servers that only accept UTF-8 traffic already exist and they need to transcode, reject, or in some other way handle non-UTF-8 traffic from clients in line with the definition written in the spec anyway.

ideally it would at least say that servers SHOULD not drop the client for sending non-UTF8, though they may ignore individual protocol messages

Definitely makes sense to discourage disconnecting the client outright. I'll play with the language there and try to PR some alternative language that encourages that only as a last resort. Thanks for the note, much appreciated!

from ircv3-specifications.

keaston avatar keaston commented on September 25, 2024

The UTF8ONLY token only exists to let clients detect that the server is UTF-8 only. It is backwards compatible with the existing situation where servers that require UTF-8 silently break with clients which are not configured to use UTF-8.

Such servers aren't really following the spirit of the backwards-compatibility principle, so it seems harmful to endorse that approach in IRCv3. The way it appears now it looks like a desired and encouraged part of the specification - ideally it would at least say that servers SHOULD not drop the client for sending non-UTF8, though they may ignore individual protocol messages.

from ircv3-specifications.

slingamn avatar slingamn commented on September 25, 2024

Such servers aren't really following the spirit of the backwards-compatibility principle, so it seems harmful to endorse that approach in IRCv3.

It's a tricky issue, yeah. I think a compatibility break is inherent in the intent of the specification --- if a server implements the spec, it's never really going to interoperate acceptably with clients that use non-UTF8 encodings (even if you can robustly transcode input, the server will only emit UTF8, likely violating client expectations that the output encoding will agree with the input encoding).

I agree with the suggestion that disconnecting the client altogether is unnecessarily aggressive and should probably be deprecated. (From the comment history on #432, it sounds like we were exploring it as the best way to get the end user's attention.)

from ircv3-specifications.

vanosg avatar vanosg commented on September 25, 2024

I'll bump this issue a year later- I agree, the concept of disconnecting a client over UTF8 seems heavy-handed and appears to be an option suggested in the UTF8ONLY spec. I would love to see this language be removed.

from ircv3-specifications.

DanielOaks avatar DanielOaks commented on September 25, 2024

I think this change gives a more accurate explanation of why this spec exists, and also removes the disconnection language entirely. Please let me know watcha think: https://gist.github.com/DanielOaks/02a60498e4be4ecb7d6be387eecb642a/revisions#diff-014869833613b58c7e37f5208548f4e64d8d0deb465a47d1db21da761158f143=

from ircv3-specifications.

vanosg avatar vanosg commented on September 25, 2024

I think the changes improve the document, and appreciate the removal of the language referencing disconnection as a server option.

from ircv3-specifications.

slingamn avatar slingamn commented on September 25, 2024

I'm OK with removing the disconnection language, but I don't like the other changes.

Only allowing this encoding breaks compatibility with the IRC protocol as written

Is this true? I've always thought of UTF8ONLY as being an example of a server's ability to impose a content moderation policy. In this case, non-UTF8 "payloads" (final parameters to PRIVMSG, NOTICE, USER, TOPIC, etc.) are being disallowed.

from ircv3-specifications.

DanielOaks avatar DanielOaks commented on September 25, 2024

Only allowing this encoding breaks compatibility with the IRC protocol as written

Is this true? I've always thought of UTF8ONLY as being an example of a server's ability to impose a content moderation policy. In this case, non-UTF8 "payloads" (final parameters to PRIVMSG, NOTICE, USER, TOPIC, etc.) are being disallowed.

Depends on your view of the protocol I guess. Some do see disallowing that as a protocol break, some responses to non-UTF-8 content (e.g. disconnecting the client) would prolly classify as a protocol break, and some don't see it as a protocol break.

I guess in my view of that sentence, I'm kind of conflating the 'decode everything as UTF-8' approach that some software does as not following the 'traditional' treat-everything-as-octets-and-bytes direction, but I guess the token/stdreplies code themselves doesn't necessarily mean that 🤷

from ircv3-specifications.

slingamn avatar slingamn commented on September 25, 2024

some don't see it as a protocol break.

Put me in this camp :-)

I found a better way to phrase my objection: the current spec language implies that non-UTF8 is legacy and UTF8 is preferred. I like this implication and I want to keep it.

from ircv3-specifications.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.