Giter Club home page Giter Club logo

Comments (8)

jmontane avatar jmontane commented on August 26, 2024

I found a new bug related with U+00B7 and Twitter. Please, see this Tweet https://twitter.com/unjoanqualsevol/status/469148413486194688 There are 2 valid and registered URLs

from twitter-text.

jmontane avatar jmontane commented on August 26, 2024

Hi,

Current Unicode UAX 31 cites 00B7 and its use in hashtags
http://www.unicode.org/reports/tr31/#Specific_Character_Adjustments

Is there any improvement or roadmap about this issue?

Regards,

from twitter-text.

montxovs avatar montxovs commented on August 26, 2024

We need to normalize the middle dot on hashtags in Catalan language.

from twitter-text.

jmontane avatar jmontane commented on August 26, 2024

Hi,

Twitter supports hashtags with middle dot (U+00B7), really good news, :)

There are some issues around middle dot support in URLs:
1.- Twitter lists with L·L are created now, but URL uses an hyphen L-L. See https://twitter.com/unjoanqualsevol/lists/l-l
2.- Twitter validates l·l in domain part of URLs, but only if schema (http://) is declared.
3.- Twitter breaks URL links if L·L is path part of URL. Sample: https://twitter.com/BernatDedeu/status/594162637396643842

Expected behaviour in all 3 cases is same currently achieved with accented letters (à,ç,ñ...). I. E. autolinking working fine with L·L

Please, note CMSs, like Wordpress, doesn't escape middle dot, and there are many word in Catalan Wiktionary with L·L. See: http://ca.wiktionary.org/wiki/Categoria:Mots_en_catal%C3%A0_amb_eles_geminades

from twitter-text.

jmontane avatar jmontane commented on August 26, 2024

Just to point one more example about autolinking URLs

See following Tweet:
https://twitter.com/XSalaimartin/status/647004755512958977

It has a link to:
http://caffereggio.net/2015/09/24/la-economia-ante-la-independencia-del-col·lectiu-wilson-en-la-vanguardia/

But Twitter autolink breaks on "·" U+00B7 char and split URL:
http://caffereggio.net/2015/09/24/la-economia-ante-la-independencia-del-col

from twitter-text.

jmontane avatar jmontane commented on August 26, 2024

Just a funny effect. Twitter autolinking feature breaks own Twitter URLs. For instance, a link to #L·L hashtag is automagically broken if it's pasted/copied in a Tweet

https://twitter.com/hashtag/L·L

from twitter-text.

twuttke avatar twuttke commented on August 26, 2024

Or, properly escaped if you copy it from the address bar of a modern
browser: https://twitter.com/hashtag/L%C2%B7L

I think it's funny how people and messaging products are gradually giving
the middle finger to https://www.ietf.org/rfc/rfc1738.txt At some point,
we'll have to invent a special url termination char because we will be
allowing all other chars to be in urls.

But, in the case of the middle dot, I don't mind adding it. It is just a
matter of what is more expected by users - that it extend the url, or
terminate the url?

Is there a new RFC for what chars are allowed in urls in the age of modern
message parsers? Seems Twitter, gmail, Facebook, etc... should all agree on
these additions. Or web pages should stop exposing unescaped urls.

On Wed, Oct 7, 2015 at 12:08 PM, Joan Montané [email protected]
wrote:

Just a funny effect. Twitter autolinking feature breaks own Twitter URLs.
For instance, a link to #L·L hashtag is automagically broken if it's
pasted/copied in a Tweet

https://twitter.com/hashtag/L·L


Reply to this email directly or view it on GitHub
#4 (comment).

from twitter-text.

jmontane avatar jmontane commented on August 26, 2024

Yeah! I know beyond-old-ASCII chars should be escaped but, as you point, several web services (Wordpress, Twitter...) generate URLs with such chars, so links become unusable, :(

MIDDLE DOT (U+00B7) is used as inner-word char for Catalan language. According to Unicode UAX TR29 it's a MidLetter character [1] on word boundary segmentation. So, it's unlikely that it's used as a URL terminator.
[1] http://unicode.org/reports/tr29/#MidLetter

from twitter-text.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.