Giter Club home page Giter Club logo

twitter-text-js's Introduction

Deprecation Notice!

This repository has been merged into the twitter-text mono-repo to simplify development, testing, creating issues, and pull requests. This repo is now inactive; please continue all activity in the mono-repo and move existing issues there.

twitter-text-js's People

Contributors

bcherry avatar bytnar avatar caniszczyk avatar cezary avatar chriswren avatar ded avatar devongovett avatar drikin avatar fent avatar gasi avatar geedew avatar hontent avatar hoverbird avatar howardr avatar jakl avatar jasontbradshaw avatar jsha avatar kennethkufluk avatar kl-7 avatar kscanne avatar leppert avatar lostplan avatar lukeasrodgers avatar mzsanford avatar sayrer avatar shinypb avatar smerchek avatar swdyh avatar tomykaira avatar twuttke avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

twitter-text-js's Issues

Discard mentions, hashtags and urls

Hello.

Sorry to bother you, but I need to know if given a text I can get the same text without mentions, hashtags and urls.

Thanks a lot in advance.

autoLink adds inappropriate attributes

twttr.txt.autoLink('hello http://twitter.com', {
  , usernameClass: 'custom-user'
})

outputs:

hello <a href="http://twitter.com" usernameClass="custom-user" >http://twitter.com</a>

Not allowed to pass options parameter to isInvalidTweet

I use

getTweetLength('tweety', {/*custom url shortener options*/});

but then when I try to use isInvalidTweet I get the default options for getTweetLength which result in different length and therefore an error.

Am I missing something?

Tweet length counter seems to be conflicting with Twitter's "Character Counting" guideline about Unicode normalization

Since Twitter's official Character Counting guideline is saying "Tweet length is measured by the number of codepoints in the NFC normalized version of the text", "café" (U+0063 U+0061 U+0066 U+0065 U+0301) should be normalized as "café" (U+0063 U+0061 U+0066 U+00E9) and counted 4 characters. This fails by simply testing

var twitter = require('twitter-text');
console.log(twitter.getTweetLength('cafe\u0301'));

prints "5" and also the 30 times repetition of "café" (which should be counted as 120 characters) is kicked as invalid tweet.

By way of comparison, Ruby implementation works different since testing

require "twitter-text"
include Twitter::Validation
p tweet_length("cafe" + 0x0301.chr("UTF-8"))

prints "4" and the 30 times repetition passes validation.

Add option extractUrlsWithoutProtocol

Why I can not set extractUrlsWithoutProtocol as option when I call twttr.txt.autoLink()?
It's hardcoded as {extractUrlsWithoutProtocol: false} so I need to change this in library code.

Consider generating the regular expressions as part of the build process instead of at runtime

E.g. this:

  var nonLatinHashtagChars = [];
  // Cyrillic
  addCharsToCharClass(nonLatinHashtagChars, 0x0400, 0x04ff); // Cyrillic
  addCharsToCharClass(nonLatinHashtagChars, 0x0500, 0x0527); // Cyrillic Supplement
  addCharsToCharClass(nonLatinHashtagChars, 0x2de0, 0x2dff); // Cyrillic Extended A
  addCharsToCharClass(nonLatinHashtagChars, 0xa640, 0xa69f); // Cyrillic Extended B
  // Hebrew
  addCharsToCharClass(nonLatinHashtagChars, 0x0591, 0x05bf); // Hebrew
  addCharsToCharClass(nonLatinHashtagChars, 0x05c1, 0x05c2);
  addCharsToCharClass(nonLatinHashtagChars, 0x05c4, 0x05c5);
  addCharsToCharClass(nonLatinHashtagChars, 0x05c7, 0x05c7);
  addCharsToCharClass(nonLatinHashtagChars, 0x05d0, 0x05ea);
  addCharsToCharClass(nonLatinHashtagChars, 0x05f0, 0x05f4);
  addCharsToCharClass(nonLatinHashtagChars, 0xfb12, 0xfb28); // Hebrew Presentation Forms
  addCharsToCharClass(nonLatinHashtagChars, 0xfb2a, 0xfb36);
  addCharsToCharClass(nonLatinHashtagChars, 0xfb38, 0xfb3c);
  addCharsToCharClass(nonLatinHashtagChars, 0xfb3e, 0xfb3e);
  addCharsToCharClass(nonLatinHashtagChars, 0xfb40, 0xfb41);
  addCharsToCharClass(nonLatinHashtagChars, 0xfb43, 0xfb44);
  addCharsToCharClass(nonLatinHashtagChars, 0xfb46, 0xfb4f);
  // Arabic
  addCharsToCharClass(nonLatinHashtagChars, 0x0610, 0x061a); // Arabic
  addCharsToCharClass(nonLatinHashtagChars, 0x0620, 0x065f);
  addCharsToCharClass(nonLatinHashtagChars, 0x066e, 0x06d3);
  addCharsToCharClass(nonLatinHashtagChars, 0x06d5, 0x06dc);
  addCharsToCharClass(nonLatinHashtagChars, 0x06de, 0x06e8);
  addCharsToCharClass(nonLatinHashtagChars, 0x06ea, 0x06ef);
  addCharsToCharClass(nonLatinHashtagChars, 0x06fa, 0x06fc);
  addCharsToCharClass(nonLatinHashtagChars, 0x06ff, 0x06ff);
  addCharsToCharClass(nonLatinHashtagChars, 0x0750, 0x077f); // Arabic Supplement
  addCharsToCharClass(nonLatinHashtagChars, 0x08a0, 0x08a0); // Arabic Extended A
  addCharsToCharClass(nonLatinHashtagChars, 0x08a2, 0x08ac);
  addCharsToCharClass(nonLatinHashtagChars, 0x08e4, 0x08fe);
  addCharsToCharClass(nonLatinHashtagChars, 0xfb50, 0xfbb1); // Arabic Pres. Forms A
  addCharsToCharClass(nonLatinHashtagChars, 0xfbd3, 0xfd3d);
  addCharsToCharClass(nonLatinHashtagChars, 0xfd50, 0xfd8f);
  addCharsToCharClass(nonLatinHashtagChars, 0xfd92, 0xfdc7);
  addCharsToCharClass(nonLatinHashtagChars, 0xfdf0, 0xfdfb);
  addCharsToCharClass(nonLatinHashtagChars, 0xfe70, 0xfe74); // Arabic Pres. Forms B
  addCharsToCharClass(nonLatinHashtagChars, 0xfe76, 0xfefc);
  addCharsToCharClass(nonLatinHashtagChars, 0x200c, 0x200c); // Zero-Width Non-Joiner
  // Thai
  addCharsToCharClass(nonLatinHashtagChars, 0x0e01, 0x0e3a);
  addCharsToCharClass(nonLatinHashtagChars, 0x0e40, 0x0e4e);
  // Hangul (Korean)
  addCharsToCharClass(nonLatinHashtagChars, 0x1100, 0x11ff); // Hangul Jamo
  addCharsToCharClass(nonLatinHashtagChars, 0x3130, 0x3185); // Hangul Compatibility Jamo
  addCharsToCharClass(nonLatinHashtagChars, 0xA960, 0xA97F); // Hangul Jamo Extended-A
  addCharsToCharClass(nonLatinHashtagChars, 0xAC00, 0xD7AF); // Hangul Syllables
  addCharsToCharClass(nonLatinHashtagChars, 0xD7B0, 0xD7FF); // Hangul Jamo Extended-B
  addCharsToCharClass(nonLatinHashtagChars, 0xFFA1, 0xFFDC); // half-width Hangul
  // Japanese and Chinese
  addCharsToCharClass(nonLatinHashtagChars, 0x30A1, 0x30FA); // Katakana (full-width)
  addCharsToCharClass(nonLatinHashtagChars, 0x30FC, 0x30FE); // Katakana Chouon and iteration marks (full-width)
  addCharsToCharClass(nonLatinHashtagChars, 0xFF66, 0xFF9F); // Katakana (half-width)
  addCharsToCharClass(nonLatinHashtagChars, 0xFF70, 0xFF70); // Katakana Chouon (half-width)
  addCharsToCharClass(nonLatinHashtagChars, 0xFF10, 0xFF19); // \
  addCharsToCharClass(nonLatinHashtagChars, 0xFF21, 0xFF3A); //  - Latin (full-width)
  addCharsToCharClass(nonLatinHashtagChars, 0xFF41, 0xFF5A); // /
  addCharsToCharClass(nonLatinHashtagChars, 0x3041, 0x3096); // Hiragana
  addCharsToCharClass(nonLatinHashtagChars, 0x3099, 0x309E); // Hiragana voicing and iteration mark
  addCharsToCharClass(nonLatinHashtagChars, 0x3400, 0x4DBF); // Kanji (CJK Extension A)
  addCharsToCharClass(nonLatinHashtagChars, 0x4E00, 0x9FFF); // Kanji (Unified)
  // -- Disabled as it breaks the Regex.
  //addCharsToCharClass(nonLatinHashtagChars, 0x20000, 0x2A6DF); // Kanji (CJK Extension B)
  addCharsToCharClass(nonLatinHashtagChars, 0x2A700, 0x2B73F); // Kanji (CJK Extension C)
  addCharsToCharClass(nonLatinHashtagChars, 0x2B740, 0x2B81F); // Kanji (CJK Extension D)
  addCharsToCharClass(nonLatinHashtagChars, 0x2F800, 0x2FA1F); // Kanji (CJK supplement)
  addCharsToCharClass(nonLatinHashtagChars, 0x3003, 0x3003); // Kanji iteration mark
  addCharsToCharClass(nonLatinHashtagChars, 0x3005, 0x3005); // Kanji iteration mark
  addCharsToCharClass(nonLatinHashtagChars, 0x303B, 0x303B); // Han iteration mark

  twttr.txt.regexen.nonLatinHashtagChars = regexSupplant(nonLatinHashtagChars.join(""));

With Regenerate, this could become:

var nonLatinHashtagChars = regenerate()
  // Cyrillic
  .addRange(0x0400, 0x04FF) // Cyrillic
  .addRange(0x0500, 0x0527) // Cyrillic Supplement
  .addRange(0x2DE0, 0x2DFF) // Cyrillic Extended A
  .addRange(0xA640, 0xA69F) // Cyrillic Extended B
  // Hebrew
  .addRange(0x0591, 0x05BF) // Hebrew
  .addRange(0x05C1, 0x05C2)
  .addRange(0x05C4, 0x05C5)
  .add(0x05c7)
  .addRange(0x05D0, 0x05EA)
  .addRange(0x05F0, 0x05F4)
  .addRange(0xFB12, 0xFB28) // Hebrew Presentation Forms
  .addRange(0xFB2A, 0xFB36)
  .addRange(0xFB38, 0xFB3C)
  .addRange(0xFB3E, 0xFB3E)
  .addRange(0xFB40, 0xFB41)
  .addRange(0xFB43, 0xFB44)
  .addRange(0xFB46, 0xFB4F)
  // Arabic
  .addRange(0x0610, 0x061A) // Arabic
  .addRange(0x0620, 0x065F)
  .addRange(0x066E, 0x06D3)
  .addRange(0x06D5, 0x06DC)
  .addRange(0x06DE, 0x06E8)
  .addRange(0x06EA, 0x06EF)
  .addRange(0x06FA, 0x06FC)
  .addRange(0x06FF, 0x06FF)
  .addRange(0x0750, 0x077F) // Arabic Supplement
  .addRange(0x08A0, 0x08A0) // Arabic Extended A
  .addRange(0x08A2, 0x08AC)
  .addRange(0x08E4, 0x08FE)
  .addRange(0xFB50, 0xFBB1) // Arabic Pres. Forms A
  .addRange(0xFBD3, 0xFD3D)
  .addRange(0xFD50, 0xFD8F)
  .addRange(0xFD92, 0xFDC7)
  .addRange(0xFDF0, 0xFDFB)
  .addRange(0xFE70, 0xFE74) // Arabic Pres. Forms B
  .addRange(0xFE76, 0xFEFC)
  .addRange(0x200C, 0x200C) // Zero-Width Non-Joiner
  // Thai
  .addRange(0x0E01, 0x0E3A)
  .addRange(0x0E40, 0x0E4E)
  // Hangul (Korean)
  .addRange(0x1100, 0x11FF) // Hangul Jamo
  .addRange(0x3130, 0x3185) // Hangul Compatibility Jamo
  .addRange(0xA960, 0xA97F) // Hangul Jamo Extended-A
  .addRange(0xAC00, 0xD7AF) // Hangul Syllables
  .addRange(0xD7B0, 0xD7FF) // Hangul Jamo Extended-B
  .addRange(0xFFA1, 0xFFDC) // half-width Hangul
  // Japanese and Chinese
  .addRange(0x30A1, 0x30FA) // Katakana (full-width)
  .addRange(0x30FC, 0x30FE) // Katakana Chouon and iteration marks (full-width)
  .addRange(0xFF66, 0xFF9F) // Katakana (half-width)
  .add(0xFF70) // Katakana Chouon (half-width)
  .addRange(0xFF10, 0xFF19) // \
  .addRange(0xFF21, 0xFF3A) //  - Latin (full-width)
  .addRange(0xFF41, 0xFF5A) // /
  .addRange(0x3041, 0x3096) // Hiragana
  .addRange(0x3099, 0x309E) // Hiragana voicing and iteration mark
  .addRange(0x3400, 0x4DBF) // Kanji (CJK Extension A)
  .addRange(0x4E00, 0x9FFF) // Kanji (Unified)
  .addRange(0x20000, 0x2A6DF) // Kanji (CJK Extension B)
  .addRange(0x2A700, 0x2B73F) // Kanji (CJK Extension C)
  .addRange(0x2B740, 0x2B81F) // Kanji (CJK Extension D)
  .addRange(0x2F800, 0x2FA1F) // Kanji (CJK supplement)
  .add(0x3003) // Kanji iteration mark
  .add(0x3005) // Kanji iteration mark
  .add(0x303B); // Han iteration mark

twttr.txt.regexen.nonLatinHashtagChars = nonLatinHashtagChars.toRegExp();

But it would be even better to not do it at runtime, but as part of a build process:

nonLatinHashtagChars.toString();
// returns a string literal that can be injected into a JS file as part of a regular expression literal
// '[\\u0400-\\u0527\\u0591-\\u05BF\\u05C1-\\u05C2\\u05C4-\\u05C5\\u05C7\\u05D0-\\u05EA\\u05F0-\\u05F4\\u0610-\\u061A\\u0620-\\u065F\\u066E-\\u06D3\\u06D5-\\u06DC\\u06DE-\\u06E8\\u06EA-\\u06EF\\u06FA-\\u06FC\\u06FF\\u0750-\\u077F\\u08A0\\u08A2-\\u08AC\\u08E4-\\u08FE\\u0E01-\\u0E3A\\u0E40-\\u0E4E\\u1100-\\u11FF\\u200C\\u2DE0-\\u2DFF\\u3003\\u3005\\u303B\\u3041-\\u3096\\u3099-\\u309E\\u30A1-\\u30FA\\u30FC-\\u30FE\\u3130-\\u3185\\u3400-\\u4DBF\\u4E00-\\u9FFF\\uA640-\\uA69F\\uA960-\\uA97F\\uAC00-\\uD7FF\\uFB12-\\uFB28\\uFB2A-\\uFB36\\uFB38-\\uFB3C\\uFB3E\\uFB40-\\uFB41\\uFB43-\\uFB44\\uFB46-\\uFBB1\\uFBD3-\\uFD3D\\uFD50-\\uFD8F\\uFD92-\\uFDC7\\uFDF0-\\uFDFB\\uFE70-\\uFE74\\uFE76-\\uFEFC\\uFF10-\\uFF19\\uFF21-\\uFF3A\\uFF41-\\uFF5A\\uFF66-\\uFF9F\\uFFA1-\\uFFDC]|[\\uD840-\\uD868\\uD86A-\\uD86D][\\uDC00-\\uDFFF]|\\uD869[\\uDC00-\\uDEDF\\uDF00-\\uDFFF]|\\uD86E[\\uDC00-\\uDC1F]|\\uD87E[\\uDC00-\\uDE1F]'

This way, the source code (before building) is still very readable/maintainable, but the built code is optimized for run-time performance.

Note that using Regenerate would also solve this problem with astral symbols:

  // -- Disabled as it breaks the Regex.
  //addCharsToCharClass(nonLatinHashtagChars, 0x20000, 0x2A6DF); // Kanji (CJK Extension B)

Would you be interested in a pull request that ports all the regular expressions to Regenerate + adds a simple build script?

extractUrls doesn't handle Non-Latin characters

This seems to be an intentional decision, but extractUrls does not handle any links that may have non-latin characters. For instance, URL's that would not be properly extracted:

extractUrl's would return:

It seems the only reason that this is the case, according to the README, is that in Japanese/Korean/Chinese, sometimes links are not followed by a space. The behavior is consistent with what I see on twitter.com. To me it seems like extractUrls should be simpler and just delimit based on spaces which would allow uncommon characters, as this is a more common use case (correct me if I'm wrong on that though). And for the use case of twitter.com, since links are highlighted as you type them, Asian tweeters will know to stick a space between links and their text.

Was there some discussion on going one way or the other on this?

No way to autoLink URLs with no protocol?

If you want to use autoLink but have it link "google.com", there doesn't appear to be a way to do this even though the option is available in some methods. Is this intentional or can I submit a PR that will check passed-in options?

I'd probably change this line to look in the passed in options and if that option is set, use it instead...

Separate files for hashtags, uri etc.

Hi

Library is now pretty large. If I want to parse hashtags on the client, I don't want to load the whole thing.

There are clearly some different modules which can be logically separated but still share some regexps.

Thanks,
Oleg

RTL Check throwing error on IE7

When using in IE7 (or IE8 in compat) line 458 is erroring out:

if (hashtag[0].match(twttr.txt.regexen.rtl_chars)){

IE7 doesnt support [0] to get the first character of the string, using charAt(0) should fix.

if (hashtag.charAt(0).match(twttr.txt.regexen.rtl_chars)){

Not compatible with Twitter Web Intents (at least in an AMD environment).

The Twitter text library is not compatible with Twitter Web Intents as it will overwrite the twttr object on window at the bottom of the file. The environment set up is an AMD (useing require.js) module based environment with both libraries shimmed.

A quick fix I implemented to solve this is I replaced the check for the existence of window and then manually extended the window object:

  if (typeof window != 'undefined') {
    if (window.twttr) {
      var prop;
      for (prop in twttr) {
        window.twttr[prop] = twttr[prop];
      }
    }
    else
      window.twttr = twttr;
  }

Syntax error in javascript

Downloaded the twitter-text.js file and am getting a syntax error around the emoticon autolinking.

Uncaught SyntaxError: Invalid regular expression: /(8-#|8-E|+-(|\@|O|<|:~(|}:o{|:-[|>o<|X-/|[:-]-I-|////Ö\\|(|:|/)|?:*)|( | ))/: Nothing to repeat

its at line 159

&#hashtag is a formal hashtag

214   twttr.txt.regexen.hashtagBoundary = regexSupplant(/(?:^|$|[^&a-z0-9_#{latinAccentChars}#{nonLatinHashtagChars}])/);

in addition, /#hashtag is not a formal hashtag.
how about like this:

214   twttr.txt.regexen.hashtagBoundary = regexSupplant(/(?:^|$|[^/a-z0-9_#{latinAccentChars}#{nonLatinHashtagChars}])/);

Escape full-with characters

twitter-text-js code has full width characters like @ (U+FF20) and # (U+FF03), and they seem to cause some problem in certain environment (esp, some mobile web). Those characters should be escaped.

change behavior of #hash autolinks

currently, #hash2 in the string "#hash1/#hash2" will not be autolinked because / is not a valid boundary. Allowing / as a valid hashtagBoundary is a possible fix, but it will also cause #hash to be mistakenly autolinked in "possible.url/#hash". Still, this might be a more desirable behavior.

API docs

Should document the available methods and options.

Library does not detect valid CCTLD's in extractUrl method

I am trying to use this library to build a prevalidator for tweets before they are sent to the twitter API however it does not pick up URL's such as bit.ly/12345 as being valid URLs. However, the twitter API does wrap these non-http cctld URLs, producing a longer tweet that is then rejected.

I believe this bug is actually apparent on the twitter website and can be reproduced by filling the tweet box with upto 140 characters and including a bit.ly link without http. It will be impossible to send the tweet.

Some RegExp optimizations

I give full credit to closure compiler for this.
I cleaned up the output a lot, the WARNING's are all for the same References to the global RegExp object prevents optimization of regular expressions.

twitter-text.js:787: WARNING - References to the global RegExp object prevents optimization of regular expressions.
        RegExp.rightContext.match(twttr.txt.regexen.endMentionMatch)) {
        ^

twitter-text.js:817: WARNING 
      var before = RegExp.$2, url = RegExp.$3, protocol = RegExp.$4, domain = RegExp.$5, path = RegExp.$7;
                   ^                ^                     ^                   ^                 ^

twitter-text.js:860: WARNING 
          url = RegExp.lastMatch;
                ^

twitter-text.js:1268: WARNING
      return ((typeof string === "string") && string.match(regex) && RegExp["$&"] === string);
                                                                     ^

twitter-text.js:1272: WARNING
    return (!string || (string.match(regex) && RegExp["$&"] === string));
                                               ^

'window is not defined' in NodeJS

The node module is breaking on line 1 (from the reference to window).

There needs to be a check to determine if window is undefined, and fall back to a local variable if it is.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.