dropbox / zxcvbn Goto Github PK

Low-Budget Password Strength Estimation

Home Page: https://www.usenix.org/conference/usenixsecurity16/technical-sessions/presentation/wheeler

License: MIT License

CoffeeScript 97.90% Python 2.10%

zxcvbn's Introduction

_________________________________________________/\/\___________________
_/\/\/\/\/\__/\/\__/\/\____/\/\/\/\__/\/\__/\/\__/\/\________/\/\/\/\___
_____/\/\______/\/\/\____/\/\________/\/\__/\/\__/\/\/\/\____/\/\__/\/\_
___/\/\________/\/\/\____/\/\__________/\/\/\____/\/\__/\/\__/\/\__/\/\_
_/\/\/\/\/\__/\/\__/\/\____/\/\/\/\______/\______/\/\/\/\____/\/\__/\/\_
________________________________________________________________________

zxcvbn is a password strength estimator inspired by password crackers. Through pattern matching and conservative estimation, it recognizes and weighs 30k common passwords, common names and surnames according to US census data, popular English words from Wikipedia and US television and movies, and other common patterns like dates, repeats (aaa), sequences (abcd), keyboard patterns (qwertyuiop), and l33t speak.

Consider using zxcvbn as an algorithmic alternative to password composition policy — it is more secure, flexible, and usable when sites require a minimal complexity score in place of annoying rules like "passwords must contain three of {lower, upper, numbers, symbols}".

More secure: policies often fail both ways, allowing weak passwords (P@ssword1) and disallowing strong passwords.
More flexible: zxcvbn allows many password styles to flourish so long as it detects sufficient complexity — passphrases are rated highly given enough uncommon words, keyboard patterns are ranked based on length and number of turns, and capitalization adds more complexity when it's unpredictaBle.
More usable: zxcvbn is designed to power simple, rule-free interfaces that give instant feedback. In addition to strength estimation, zxcvbn includes minimal, targeted verbal feedback that can help guide users towards less guessable passwords.

For further detail and motivation, please refer to the USENIX Security '16 paper and presentation.

At Dropbox we use zxcvbn (Release notes) on our web, desktop, iOS and Android clients. If JavaScript doesn't work for you, others have graciously ported the library to these languages:

zxcvbn-python (Python)
zxcvbn-cpp (C/C++/Python/JS)
zxcvbn-c (C/C++)
zxcvbn-rs (Rust)
zxcvbn-go (Go)
zxcvbn4j (Java)
nbvcxz (Java)
zxcvbn-ruby (Ruby)
zxcvbn-js (Ruby [via ExecJS])
zxcvbn-ios (Objective-C)
zxcvbn-cs (C#/.NET)
szxcvbn (Scala)
zxcvbn-php (PHP)
zxcvbn-api (REST)
ocaml-zxcvbn (OCaml bindings for zxcvbn-c)

Integrations with other frameworks:

angular-zxcvbn (AngularJS)

Installation

zxcvbn detects and supports CommonJS (node, browserify) and AMD (RequireJS). In the absence of those, it adds a single function zxcvbn() to the global namespace.

Bower

Install node and bower if you haven't already.

Get zxcvbn:

cd /path/to/project/root
bower install zxcvbn

Add this script to your index.html:

<script src="bower_components/zxcvbn/dist/zxcvbn.js">
</script>

To make sure it loaded properly, open in a browser and type zxcvbn('Tr0ub4dour&3') into the console.

To pull in updates and bug fixes:

bower update zxcvbn

Node / npm / MeteorJS

zxcvbn works identically on the server.

$ npm install zxcvbn
$ node
> var zxcvbn = require('zxcvbn');
> zxcvbn('Tr0ub4dour&3');

RequireJS

Add zxcvbn.js to your project (using bower, npm or direct download) and import as usual:

requirejs(["relpath/to/zxcvbn"], function (zxcvbn) {
    console.log(zxcvbn('Tr0ub4dour&3'));
});

Browserify / Webpack

If you're using npm and have require('zxcvbn') somewhere in your code, browserify and webpack should just work.

$ npm install zxcvbn
$ echo "console.log(require('zxcvbn'))" > mymodule.js
$ browserify mymodule.js > browserify_bundle.js
$ webpack mymodule.js webpack_bundle.js

But we recommend against bundling zxcvbn via tools like browserify and webpack, for three reasons:

Minified and gzipped, zxcvbn is still several hundred kilobytes. (Significantly grows bundle size.)
Most sites will only need zxcvbn on a few pages (registration, password reset).
Most sites won't need zxcvbn() immediately upon page load; since zxcvbn() is typically called in response to user events like filling in a password, there's ample time to fetch zxcvbn.js after initial html/css/js loads and renders.

See the performance section below for tips on loading zxcvbn stand-alone.

Tangentially, if you want to build your own standalone, consider tweaking the browserify pipeline used to generate dist/zxcvbn.js:

$ browserify --debug --standalone zxcvbn \
    -t coffeeify --extension='.coffee' \
    -t uglifyify \
    src/main.coffee | exorcist dist/zxcvbn.js.map >| dist/zxcvbn.js

--debug adds an inline source map to the bundle. exorcist pulls it out into dist/zxcvbn.js.map.
--standalone zxcvbn exports a global zxcvbn when CommonJS/AMD isn't detected.
-t coffeeify --extension='.coffee' compiles .coffee to .js before bundling. This is convenient as it allows .js modules to import from .coffee modules and vice-versa. Instead of this transform, one could also compile everything to .js first (npm run prepublish) and point browserify to lib instead of src.
-t uglifyify minifies the bundle through UglifyJS, maintaining proper source mapping.

Manual installation

Download zxcvbn.js.

Add to your .html:

<script type="text/javascript" src="path/to/zxcvbn.js"></script>

Usage

try zxcvbn interactively to see these docs in action.

zxcvbn(password, user_inputs=[])

zxcvbn() takes one required argument, a password, and returns a result object with several properties:

result.guesses            # estimated guesses needed to crack password
result.guesses_log10      # order of magnitude of result.guesses

result.crack_times_seconds # dictionary of back-of-the-envelope crack time
                          # estimations, in seconds, based on a few scenarios:
{
  # online attack on a service that ratelimits password auth attempts.
  online_throttling_100_per_hour

  # online attack on a service that doesn't ratelimit,
  # or where an attacker has outsmarted ratelimiting.
  online_no_throttling_10_per_second

  # offline attack. assumes multiple attackers,
  # proper user-unique salting, and a slow hash function
  # w/ moderate work factor, such as bcrypt, scrypt, PBKDF2.
  offline_slow_hashing_1e4_per_second

  # offline attack with user-unique salting but a fast hash
  # function like SHA-1, SHA-256 or MD5. A wide range of
  # reasonable numbers anywhere from one billion - one trillion
  # guesses per second, depending on number of cores and machines.
  # ballparking at 10B/sec.
  offline_fast_hashing_1e10_per_second
}

result.crack_times_display # same keys as result.crack_times_seconds,
                           # with friendlier display string values:
                           # "less than a second", "3 hours", "centuries", etc.

result.score      # Integer from 0-4 (useful for implementing a strength bar)

  0 # too guessable: risky password. (guesses < 10^3)

  1 # very guessable: protection from throttled online attacks. (guesses < 10^6)

  2 # somewhat guessable: protection from unthrottled online attacks. (guesses < 10^8)

  3 # safely unguessable: moderate protection from offline slow-hash scenario. (guesses < 10^10)

  4 # very unguessable: strong protection from offline slow-hash scenario. (guesses >= 10^10)

result.feedback   # verbal feedback to help choose better passwords. set when score <= 2.

  result.feedback.warning     # explains what's wrong, eg. 'this is a top-10 common password'.
                              # not always set -- sometimes an empty string

  result.feedback.suggestions # a possibly-empty list of suggestions to help choose a less
                              # guessable password. eg. 'Add another word or two'

result.sequence   # the list of patterns that zxcvbn based the
                  # guess calculation on.

result.calc_time  # how long it took zxcvbn to calculate an answer,
                  # in milliseconds.

The optional user_inputs argument is an array of strings that zxcvbn will treat as an extra dictionary. This can be whatever list of strings you like, but is meant for user inputs from other fields of the form, like name and email. That way a password that includes a user's personal information can be heavily penalized. This list is also good for site-specific vocabulary — Acme Brick Co. might want to include ['acme', 'brick', 'acmebrick', etc].

Performance

runtime latency

zxcvbn operates below human perception of delay for most input: ~5-20ms for ~25 char passwords on modern browsers/CPUs, ~100ms for passwords around 100 characters. To bound runtime latency for really long passwords, consider sending zxcvbn() only the first 100 characters or so of user input.

script load latency

zxcvbn.js bundled and minified is about 400kB gzipped or 820kB uncompressed, most of which is dictionaries. Consider these tips if you're noticing page load latency on your site.

Make sure your server is configured to compress static assets for browsers that support it. (nginx tutorial, Apache/IIS tutorial.)

Then try one of these alternatives:

Put your <script src="zxcvbn.js"> tag at the end of your html, just before the closing </body> tag. This ensures your page loads and renders before the browser fetches and loads zxcvbn.js. The downside with this approach is zxcvbn() becomes available later than had it been included in <head> — not an issue on most signup pages where users are filling out other fields first.
If you're using RequireJS, try loading zxcvbn.js separately from your main bundle. Something to watch out for: if zxcvbn.js is required inside a keyboard handler waiting for user input, the entire script might be loaded only after the user presses their first key, creating nasty latency. Avoid this by calling your handler once upon page load, independent of user input, such that the requirejs() call runs earlier.
Use the HTML5 async script attribute. Downside: doesn't work in IE7-9 or Opera Mini.
Include an inline <script> in <head> that asynchronously loads zxcvbn.js in the background. Advantage over (3): it works in older browsers.

// cross-browser asynchronous script loading for zxcvbn.
// adapted from http://friendlybit.com/js/lazy-loading-asyncronous-javascript/

(function() {

  var ZXCVBN_SRC = 'path/to/zxcvbn.js';

  var async_load = function() {
    var first, s;
    s = document.createElement('script');
    s.src = ZXCVBN_SRC;
    s.type = 'text/javascript';
    s.async = true;
    first = document.getElementsByTagName('script')[0];
    return first.parentNode.insertBefore(s, first);
  };

  if (window.attachEvent != null) {
    window.attachEvent('onload', async_load);
  } else {
    window.addEventListener('load', async_load, false);
  }

}).call(this);

Development

Bug reports and pull requests welcome!

git clone https://github.com/dropbox/zxcvbn.git

zxcvbn is built with CoffeeScript, browserify, and uglify-js. CoffeeScript source lives in src, which gets compiled, bundled and minified into dist/zxcvbn.js.

npm run build    # builds dist/zxcvbn.js
npm run watch    # same, but quickly rebuilds as changes are made in src.

For debugging, both build and watch output an external source map dist/zxcvbn.js.map that points back to the original CoffeeScript code.

Two source files, adjacency_graphs.coffee and frequency_lists.coffee, are generated by python scripts in data-scripts that read raw data from the data directory.

For node developers, in addition to dist, the zxcvbn npm module includes a lib directory (hidden from git) that includes one compiled .js and .js.map file for every .coffee in src. See prepublish in package.json to learn more.

Acknowledgments

Dropbox for supporting open source!

Mark Burnett for releasing his 10M password corpus and for his 2005 book, Perfect Passwords: Selection, Protection, Authentication.

Wiktionary contributors for building a frequency list of English words as used in television and movies.

Researchers at Concordia University for studying password estimation rigorously and recommending zxcvbn.

And xkcd for the inspiration 👍🐴🔋❤️

zxcvbn's People

Contributors

Stargazers

Watchers

Forkers

ramontgo olragon trepca mwesten joe8bit benilton wallrazer sohailprasad odacer croby codesburner vervelak rafaelsc tekul dls bjcubsfan thulinma enaeseth pjt33 rpearl nmalkin varenc adamhopkinson hawkrives delaguardo zibmedia evadnoob josecarlospsh philcarson ltk midincihuy liuyunclouder plutz noone-silent seancojr poisonvx rasky gingerlime coffeebook aybabtme i3rixon mauricionobrega robschley kasparaasamets dcburke tdonia g8d3 awesome jayzeng serpman ivolivares larsthegeek dreshfield nonspecialist simonbowen darkstiffler doughsay smillaedler gemshare-jonathan softserbia xee5ch henrynok b2msolutions ix-xerri priestd09 jauderho lamperi brianhempel thevisus jpkcom epervago richardldavis tjunnone gsempe giastfader iandunn plounze pombredanne chibuisimaduka davet1985 stamhe battbeach nacredata wentianle piivonen mdcurtis kewllife darcythomas ndominati-luna ekatsarski fritids hilem bmoeskau meedamian shea256 stubben smagnan ojezu cazacugmihai tamvm

zxcvbn's Issues

zxcvbn with Require.js

Did anyone ever try this with Require.js?

Cannot read property 'toLowerCase' of undefined

Happens when one of my variables in the array of the second argument is of type "undefined".

I used [variable || ''] in my code but it just feels like it's something that should be implemented within the core of zxcvbn, otherwise it just won't work whenever one of the members of the user_inputs array is not a string.

Thants.

zxcvbn-async.js includes reference to external zxcvbn.js file

Hi there,

It would appear that the async file contains a reference to zxcvbn.js on dropbox.

There are two issues with this.

Firstly, the link is not under https, so causes certificate problems on SSL secured pages where it is used. eg. password reset forms.
Secondly, what happens if the dropbox location vanishes for whatever reason? :)

zxcvbn-async.js contains hard coded incorrect path

src="//dl.dropbox.com/u/209/zxcvbn/zxcvbn.js"

should probably be:

src="zxcvbn.js"

Viability of a Bloom filter?

I know you can't include a Bloom filter due to the frequency rank, but what about including one filter per rank? Is there something that prevents you from using one filter for common words, another for less common words, etc? It could greatly cut down on the library's size...

Rename zxcvbn.js to zxcvbn.min.js and leave zxcvbn.js unminified

Because I want to minify it myself through other minifiers.

And it's a resource we can build, not source code itself. Doesn't belong to source control either.

See
http://stackoverflow.com/questions/10854845/should-i-version-control-the-minified-versions-of-my-jquery-plugins and http://blog.andrewray.me/dealing-with-compiled-files-in-git/

QML support

Can this also be used in Qt's QML as an JavaScript import module? I cannot get it to work.

leetspeak dictionary does not catch 8 = B

Trou8ador! is given a score of 2, entropy of 33

Overestimates cardinality for bruteforce substrings when other types of characters present elsewhere

It seems that the cardinality used to calculate entropy of bruteforce substrings is calculated based on the entire string. For example:

frzplfqetuothv: cardinality 26, as expected

frzplfqetuothvpassword: cardinality 26, as expected

frzplfqetuothvpasswordA: cardinality 85.

The A present elsewhere in the string causes it to assume that the letters of frzplfqetuothv were sampled from the larger set of characters. This doesn't make sense, because users often pick passwords with e.g. a punctuation mark attached. Thus it is vastly overestimating the entropy of such passwords. The cardinality should be calculated per bruteforce substring.

Related: It parses passwords like frzplfqetuothvCOCIWDZOAZPVRL as one long bruteforce string. It should attempt to split such strings into multiple bruteforce runs with lower cardinality.

test.js missing from /test directory

test/index.html loads script test.js. This file is removed and prevents index.html from displaying any results. Prior pulls had this file. Suggest restore test/test.js or rewrite test/index.html to display results.

Inconsistent results when going from 2 to 3 digits

The library says that word781 has less entropy than word78, because 781 is identified as digits but 78 is identified as bruteforce with lower entropy:

password:   word78
entropy:    20.928
crack time (seconds):   99.743
crack time (display):   3 minutes
score from 0 to 4:      0
calculation time (ms):  0

password:   word781
entropy:    18.677
crack time (seconds):   20.95
crack time (display):   instant
score from 0 to 4:      0
calculation time (ms):  1

i18n support?

More feature request than issue. Would be great to have i18n support so that the "very weak", "weak", "so-so", "good", and "great" could be customized by language.

Cleanup Structure

Would suggest a preliminary cleanup of the project structure...

data/          - data directory
data-scripts/  - python scripts to load/parse data
dist/          - folder for target/compiled js
src/           - coffeescript & javascript source files
tests/         - unit tests (mocha?)
bower.json     - bower package info
package.json   - npm/node package information

package.json can reference src/(index.js) as the primary file. npm scripts can be added to package.json for build, and/or use something like gulp.

Given that npm is already used for the installation of coffee-script and the nature of this package, it may be worthwhile to use uglify-js over closure compiler. It may not be quite as tight, but would reduce the need for external tooling (beyond node/npm) for this.

Converting the scripts to use CommonJS/node syntax could be used in combination with Browserify for the build, enabling a global and amd target in the dist directory from the same source.

I'd be happy to take this on, creating a fork and PR for the changes if there would be interest for including such changes/updates.

Error is : zxcvbn.js, line 43 character 47.

Dear Sir,

Thanks for contribute zxcvbn.js first of all. This API perfectly works for me. But I am getting issue on IE8. Error is : zxcvbn.js, line 43 character 47.
I include zxcvbn-async.js on my page after changing path of zxcvbn.
I really appreciate your help.

Thanks & Regards
Saurabh Sharma
+91-9602273529

License

Hey, can you clarify what license the code is intended to be under? And in case it is GPL or LGPL, have you considered using a more liberal license like MIT/BSD/APLv2?

bruteforce should score by frequency/rank

A very common password style is take the first letter of each word in a sentence/phrase, possibly with some substitutions. This leads to a fairly random looking password that is easy to remember, but hard to brute force. The letters are not randomly distributed however, as they're related to the frequencies of letters as the first letter of words. There are far more words starting with s,c,p than with x,z,y,q or numbers. Thus instead of treating it as cardinality 26 for any lower case letter, treat each letter individually based on its rank in the list of 95 printable characters.

Uncaught ReferenceError: zxcvbn is not defined

zxcvbn = require('zxcvbn') returns undefined when the whole library is packed with webpack.

without dictionary?

How much would I lose running this library without the dictionary part? My concern is about file size since 300kb download is unacceptable in [my] mobile scenario.. What dropbox does about that?
Thanks for such library, anyhow it is awesome.

Suggestion: Search hash for leaked passwords

Many sites store user passwords using their hashes (SHA1, MD5, etc), and many do so without using any type of hash.
This week we have known of various massive leaks of lists of such hashed, unsalted passwords (e.g. LinkedIn, Last.fm and eHarmony).

Many users use one password, or have a pool of passwords they repeat among multiple sites, probably don't knowing their password has been compromised somewhere else.

I know googling user's passwords is essentially a bad idea, but I think establishing a secure connection to Google and searching a password hash (SHA1 or MD5) shouldn't compromise users' security.

Each entry in the match sequence needs to add some inherent entropy

zxcvbn decomposes each password into a match sequence, and then for each match says, "aha, I can find this part in an English dictionary (7 bits)", "this next piece is a name (4 bits), "this is brute force (9 bits)".

There is an inherent entropy to changing models each time. It's probably not much (2-6 bits per entry in the match sequence, I'm guessing) but at the moment zxcvbn is underestimating passwords that jump between a number of these.

Commit 5407bc2 broke backwards compatibility for directly-loaded client-side API

Previously, after loading the zxcvbn.js script directly (either synchronously or asynchronously), you'd call it withwindow.zxcvbn(password).

After commit 5407bc2, you have to do window.zxcvbn.zxcvbn(password), which is a backwards-incompatible change. If that was intentional, you'll want to publish a new major revision instead of a new patch revision as you did.

Full user_input matches are not take into account

zxcvbn("[email protected]", "[email protected]") has an entrophy of 24.015, even though it should be somewhere around 1.

Somehow zxcvbn does not recognize full matches of user_input.

False positive for a string matching exactly user input

We are using firstname, lastname and email as parts of the blacklist. The following call returns a very high score, nevertheless the email address is part of the blacklist:

var mail = '[email protected]',
    blacklist = ['[email protected]'];

console.log(zxcvbn(mail, blacklist));

{
  "password":"[email protected]",
  "entropy":111.544,
  "match_sequence":[
    {
      "pattern":"dictionary",
      "i":0,
      "j":0,
      "token":"I",
      "matched_word":"i",
      "rank":2,
      "dictionary_name":"english",
      "base_entropy":1,
      "uppercase_entropy":1,
      "l33t_entropy":0,
      "entropy":2
    },
    {
      "pattern":"bruteforce",
      "i":1,
      "j":1,
      "token":"m",
      "entropy":6.409390936137703,
      "cardinality":85
    },
    {
      "pattern":"dictionary",
      "i":2,
      "j":3,
      "token":"me",
      "matched_word":"me",
      "rank":10,
      "dictionary_name":"english",
      "base_entropy":3.3219280948873626,
      "uppercase_entropy":0,
      "l33t_entropy":0,
      "entropy":3.3219280948873626
    },
    {
      "pattern":"bruteforce",
      "i":4,
      "j":5,
      "token":"r.",
      "entropy":12.818781872275405,
      "cardinality":85
    },
    {
      "pattern":"dictionary",
      "i":6,
      "j":7,
      "token":"no",
      "matched_word":"no",
      "rank":18,
      "dictionary_name":"english",
      "base_entropy":4.169925001442312,
      "uppercase_entropy":0,
      "l33t_entropy":0,
      "entropy":4.169925001442312
    },
    {
      "pattern":"bruteforce",
      "i":8,
      "j":10,
      "token":"ch.",
      "entropy":19.228172808413106,
      "cardinality":85
    },
    {
      "pattern":"dictionary",
      "i":11,
      "j":15,
      "token":"nicht",
      "matched_word":"nicht",
      "rank":24155,
      "dictionary_name":"english",
      "base_entropy":14.56003423231944,
      "uppercase_entropy":0,
      "l33t_entropy":0,
      "entropy":14.56003423231944
    },
    {
      "pattern":"bruteforce",
      "i":16,
      "j":16,
      "token":".",
      "entropy":6.409390936137703,
      "cardinality":85
    },
    {
      "pattern":"dictionary",
      "i":17,
      "j":23,
      "token":"Invited",
      "matched_word":"invited",
      "rank":1175,
      "dictionary_name":"english",
      "base_entropy":10.198445041452363,
      "uppercase_entropy":1,
      "l33t_entropy":0,
      "entropy":11.198445041452363
    },
    {
      "pattern":"dictionary",
      "i":24,
      "j":24,
      "token":"@",
      "matched_word":"a",
      "rank":5,
      "dictionary_name":"english",
      "l33t":true,
      "sub":{
        "@":"a"
      },
      "sub_display":"@ -> a",
      "base_entropy":2.321928094887362,
      "uppercase_entropy":0,
      "l33t_entropy":1,
      "entropy":3.321928094887362
    },
    {
      "pattern":"dictionary",
      "i":25,
      "j":28,
      "token":"mail",
      "matched_word":"mail",
      "rank":1135,
      "dictionary_name":"english",
      "base_entropy":10.148476582178278,
      "uppercase_entropy":0,
      "l33t_entropy":0,
      "entropy":10.148476582178278
    },
    {
      "pattern":"bruteforce",
      "i":29,
      "j":29,
      "token":".",
      "entropy":6.409390936137703,
      "cardinality":85
    },
    {
      "pattern":"dictionary",
      "i":30,
      "j":32,
      "token":"com",
      "matched_word":"com",
      "rank":2994,
      "dictionary_name":"english",
      "base_entropy":11.547858506058418,
      "uppercase_entropy":0,
      "l33t_entropy":0,
      "entropy":11.547858506058418
    }
  ],
  "crack_time":1.8922410863462927e+29,
  "crack_time_display":"centuries",
  "score":4,
  "calc_time":10
}

Porting this awesome lib for another language - guidelines

First off all I want to congrats everyone involved in the development of this project.

I'm interested to improve the usage of this library for Portuguese language, so to achieve it I need to research for common words in and most popular password words in this language to build a more accurate bad password list, I guess.

There is some information describing the process for change the code to provide the dictionary for a set of words in another language, or maybe some simple approach to use this lib with bad password list and not permitted password in another language too?

Thanks!

Has 'time to crack' gotten shorter with more GPUs?

The Ars Technica article
http://arstechnica.com/security/2013/05/how-crackers-make-minced-meat-out-of-your-passwords/2/
said that they quickly bruteforced all passwords of 6 or less, plus the all-upper or all-lower passwords of length 7 or 8 in just 41 seconds.
Yet the 'crack time (display):' for an all caps 7 char password is 5 years. I suspect more GPU power has made cracking faster.

Suggestion: Better URL for the demo

Whenever I need to register for an account on a site that doesn't implement zxcvbn, I search Google for the blog post, search the page for "demo", and do the thing.

Maybe put it at dropbox.com/zxcvbn, register a domain, something like that?

http://dl.dropbox.com/u/209/zxcvbn/test/index.html is too long to memorize!

Suggestion: Use Google to support foreign languages, slang et al

Right now zxcvbn only works with the assumption that the attacker limits himself to english - what he will most certainly not do.
Just one example: A common german word like "Schokolade" gets a score of 4/4

Now it is of course impossible to include every possible language. But you don't need to. There is a huge database of existing words available for query: Google will happily tell you, that Millions of websites include the word -- which is a very clear indication, that this is not a good password. So I suggest you include the results of a Google query in your scoring algorithm.

bye, ju

having trouble installing

Hi, I'm trying to install zxcvbn by adding it to my project's bower.json, but it seems unable to find the module. If I install it directly using "bower install", it works as expected. Should it be trying to pull from lowe/zxcvbn? I see that navigating to github.com/lowe/zxcvbn in my browser redirects me here. Perhaps that is confusing bower somehow? For now I have it working by using https://raw.githubusercontent.com/dropbox/zxcvbn/482ed03a10779ec125100721c2d828b97abf9ea6/zxcvbn.js for my url, but obviously this is not ideal. Any ideas?

user_input Check is Case-Sensitive

I think that user_input should be penalized without regard to case. For example, say I'm checking the strength of a password on a website when a user updates their profile, and I want to make sure they don't include their username in the password; if their username is "cheese", and they set their password as "CheeSe", then the user_input penalty would not be applied, because the matching is case-sensitive.

zxcvbn( 'cheese' );
Object {password: "cheese", entropy: 6.15, match_sequence: Array[1], crack_time: 0.004, crack_time_display: "instant"…}
zxcvbn( 'cheese', [ 'cheese' ] );
Object {password: "cheese", entropy: 0, match_sequence: Array[1], crack_time: 0, crack_time_display: "instant"…}
zxcvbn( 'cheese', [ 'CheeSe' ] );
Object {password: "cheese", entropy: 6.15, match_sequence: Array[1], crack_time: 0.004, crack_time_display: "instant"…}

Note that you have to reload the page between tests to get accurate results, because of #31.

Two-digit year date problems

I've noticed a few issues with date matching.

For example:

04052001 is matches as a date, but 040501 is not (and has higher entropy)

And with separators:

27.05.2005 - date, entropy=17.434
27.05.05 - bruteforce, entropy = 43.41

The latter, at least, seems to be due to check_date only accepting 4-digit years. Also, both day and year would need to be checked for the 2-digit year case to see if either is <= 31.

I also noticed that zero is accepted for days and months, though that isn't a big issue as it will at worst cause an underestimate for the entropy.

Detect and account for repeated choices

There's a number of examples of overestimated entropies for passwords like these:
"ekekekekek", " . . . . . " (#39). Also space seperated passwords ("dark and stormy night") get quite a high entropy(#21).

I think this is because the algorithm does not detect higher level repetition patterns. For space separated strings it's okay that it does not take into account that there is a repetition of

(dictionaryword)(bruteforce)

because the algorithm ignores the entropy in that information anyway. The problem is that it does not recognize that always the same selection is picked within a given searchspace. To illustrate this... you get an even higher entropy for "dark dark dark dark".

So when calculating the entropy a selection that occures n times should add ln(n) plus the entropy of the selection itself instead of n times the entropy of the selection itself. This should probably be applied before picking the lowest entropy.

". . . . . ." is a good password

It seems like the "repeat" strategy should apply to multiple-character runs.

Add option to get dictionary from server

Would it be at all possible to be able to pass in an option to include the dictionary via server as opposed to be included directly in the library like it is? Just to help out with loading the library.

phonetic alphabet should be tested

The string

alpha bravo charlie delta

tests as very strong, even though it's just ABCD spelled out in the ITU phonetic alphabet.

Crack time is too optimistic

zxcvbn assumes that passwords are hashed with bcrypt, scrypt or PBKDF2. This is fine when you know exactly what hashing algorithm is used. Sadly this isn't always the case and it should be stated more clearly in the project documentation.

The assumed time per guess(with strong hash) is 1/10 seconds and with 100 cores that makes 1/10000 seconds. With MD5, SHA256 or SHA512, the situation is different. For example with Oclhashcat, it takes just 1/1952000000 seconds to calculate single SHA512 hash(using 8 GPUs). See oclHashcat performance table.

For example with skjiqonjhrp, zxcvbn returns 3 years as crack time. If it was hashed with SHA512, the crack time would be about 320 seconds. I calculated it with entropy returned by zxcvbn and using the same formula as in your blog post: 0.5 * 2^40.188 * 1/1952000000. The difference is huge.

If people would use this as a general "how good is my password" meter, it would give overly optimistic results since there are countless of services out there using MD5/SHA256/SHA512.

Underestimation of cardinality for unicode characters

calc_bruteforce_cardinality currently considers 'symbols' anything which is not an alphanumeric character, and gives symbols a cardinality of 33. This makes sense for ASCII passwords.

In Linux and Mac operating systems, it is quite easy to enter lots of common unicode charaters. For instance, on my Mac, if I press the right alt key, I can get the following symbols by tapping any other key on the keyboard

«“‘¥~‹÷´`≠
„Ω€®™æ¨œøπ
åß∂ƒ∞∆ªº¬
∑†©√∫˜µ

Another set of characters can be obtained with shift+alt.

Some of them can be used for an advanced form of leetification (eg: πainting instead of painting), others can simply be used in complicated passphrases (eg: to the ∞ and beyond™).

zxcvbn doesn't increase the cardinality for the usage of such characters. I think it should, and it's quite easy: any character >0x7f should be considered part of a different character set. I suggest to increase the cardinality by about 100 (which is the number of special characters I can make with my keyboard and my operating system).

I am not sure how and if we should also attempt to detect leetification with unicode characters; since it's far from common, I think that detecting it and give it just 1 bit of entropy would be an underestimation, but I'm open to ideas.

I can provide a patch if you nod on the general idea.

"leet" substitution dictionary is lacking.

As stated your dictionary is well... lacking. It's not including a couple of characters that I've seen in the "hacker's leet" from my calculator days.

Since I absolutely despise coffee I've not done a fork and then merged it back in. but here they are, so that you or anyone else can add them.

d,D = )
p,P = 6
it's a couple of characters but still.
ck=x
ers=orz

But anyway that's the only ones that I've seen missing from your system. Since it's supposed to use the "substituions" that people use, and I myself have used them, and lots of people I know have too, your meter would benefit from having them. At least the first two. But all would be best, but I'd think it'd require a bit more for your meter to use. Anyway looks interesting, also you have some that I've never heard of anyone using, so these would at the very least help it.

Publish it to npm

Please. When you already support Bower, then why not npm too?

Very near other fields match

I'm currently implementing this on a site and several users found that they can enter their password without the @ sign. I pass the email in the input fields argument. I'm not sure if this needs a stronger check, for instance a user might use a combination of their name and company for the password. So perhaps any instance of another field found anywhere in the password should reduce the strength, and a match of the password without the @ sign fail?

Singe characters match dictionaries

It's quite strange, that single characters match dictionaries:
a has an entropy of 2.33 whereas i has an entropy of 1 because it is matched by the dictionary.

The same problem exists with single digits:
1(entropy = 2) and 4 (entropy = 3.32( match a dictionary, whereas the other single digits have an entropy of 5.42.

user_input breaks subsequent calls that don't pass user_input

console.log( zxcvbn( 'iandunn' ) );
Object {password: "iandunn", entropy: 14.892, match_sequence: Array[3], crack_time: 1.52, crack_time_display: "instant"…}   
console.log( zxcvbn( 'iandunn', [ 'iandunn' ] ) );
Object {password: "iandunn", entropy: 0, match_sequence: Array[1], crack_time: 0, crack_time_display: "instant"…}   
console.log( zxcvbn( 'iandunn' ) );
Object {password: "iandunn", entropy: 0, match_sequence: Array[1], crack_time: 0, crack_time_display: "instant"…}

The third call should return the exact same results as the first call, but instead it returns the results of the 2nd.

Cannot find module './zxcvbn/zxcvbn' (coffee script problem)

This module needs to ship with prebuilt javascript code, otherwise it doesn't work without coffeescript.
the simplest way to do this would be to call a build step from within a prepublish script in the package.json.

here is the error I get

$ npm install zxcvbn
$ node
> require('zxcvbn')
Error: Cannot find module './zxcvbn/zxcvbn'
    at Function.Module._resolveFilename (module.js:336:15)
    at Function.Module._load (module.js:278:25)
    at Module.require (module.js:365:17)
    at require (module.js:384:17)
    at Object.<anonymous> (/home/dominic/c/experiments/node_modules/zxcvbn/index.js:1:80)
    at Module._compile (module.js:460:26)
    at Object.Module._extensions..js (module.js:478:10)
    at Module.load (module.js:355:32)
    at Function.Module._load (module.js:310:12)
    at Module.require (module.js:365:17)

Check for exports first.

It would be nice if the zxcvbn function was an export which doesn't happen if window is available. I think exports should be checked before window so we don't pollute the global namespace.

Currently it is:

"undefined" !== typeof window && null !== window ? (window.zxcvbn = o, "function" === typeof window.zxcvbn_load_hook && window.zxcvbn_load_hook()) : "undefined" !== typeof exports && null !== exports && (exports.zxcvbn = o)

spaced passwords add too much entropy.

a b c d e f g h i j k l m n o p q r s t u v w x y z 0 1 2 3 4 5 6 7 8 9
entropy:417.915
crack_time(seconds):3.190943210320069e+121
crack_time(readable):centuries
score:4

abcdefghijklmnopqrstuvwxyz0123456789
entropy:14.118
crack_time(seconds):0.889
crack_time(readable):instant
score:0

a b c 1 2 3
entropy:59.299
crack_time(seconds):35452087835576.43
crack_time(readable):centuries
score:4

entropy:3.807
crack_time(seconds):0.001
crack_time(readable):instant
score:0

just adding a space between every letter does not make it all that much more secure...

Add header to minified JS files.

There's nothing in the zxcvbn.js file to identify what the project, license or version is.

It would be useful to have a simple header included, such as:

/*
    zxcvbn - realistic password strength estimation
    Updated: 20130203032247
    Project: https://github.com/lowe/zxcvbn
    License: MIT
*/

cardinality bug?

https://dl.dropboxusercontent.com/u/209/zxcvbn/test/index.html
temppass22 cardinality is 69
should not it be 36 (lower case + digits = 26+10)?

license question

nvm, I found the apache license, it wasn't in the read me which is where I expected it.

Adoption by Drupal & i18n considerations

This code is being considered for adoption in Drupal 8: https://drupal.org/node/1497290

There are concerns though about how English focused the dictionaries are.

It would also be good if the lists of strings could be updated regularly.

I assume too that there is support for all UTF8 characters.

Browser unresponsive for long passwords (WordPress)

WordPress has an issue where if a long password (500+ characters) is checked for password strength, the browser becomes unresponsive for many seconds or minutes: See issue #31772 for the details.

Possible solutions

Use Web Workers to do the strength checking
- Would this be a welcome contribution?
Enhance axcvbn to stop checking strength once a specified threshold of entropy is reached.
- What would the default threshold be?
Only use zxcvbn if the password is more than 32 characters long
- The problem with this is that weak passwords can be longer than 32 characters. E.g. 33 zeroes: 000000000000000000000000000000000
Only use zxcvbn on the first 32 characters
- Would this make zxcvbn less effective?
Improve the performance of the strength-checking for longer passwords
- Is this even possible without significantly impacting its accuracy?

uk keyboard layout

Possible feature that could be added is having a uk and us keyboard layout in the adjacency graph. At the moment doing !"£$%^&*() gets a score greater than 1 which is probably wrong because its a fairly simple sequence.

I forked it and tried to implement it myself but the graph generation script doesn't play nice with the £ sign because its a unicode character and all the other characters are ASCII (i.e., when doing len(3£) it returns 3 instead of 2 because the £ is considered by bytes.

Another workaround for this could just be to add !"£$%^&*() to one of the dictionaries of penalised words.