This repository has been migrated to https://github.com/mozilla/fxa/tree/master/packages/fxa-auth-db-mysql
Please file issues and open pull requests against https://github.com/mozilla/fxa
DEPRECATED - Migrated to https://github.com/mozilla/fxa
License: Mozilla Public License 2.0
Please file issues and open pull requests against https://github.com/mozilla/fxa
I was looking at what was happening when auth-db-mysql was using the wrong version of auth-db-server. Right here, we treat any 404 as meaning 'password mismatch'.
Is 404 the right code for a password mismatch? Should it return a sub-code on the wire to indicate more precisely what is wrong?
(p.s., it's hard to figure out where to put bugs sometimes in cases like these where the 404 is consumed by one repo but produced by another. I started writing this in fxa-auth-server, but realized it was a question about the api).
While digging to our use of the verifierSetAt
field, I noticed that at account-creation time we explicitly pass in a timestamp from the auth-server:
https://github.com/mozilla/fxa-auth-db-mysql/blob/master/db/mysql.js#L177
While at account-reset time we allow the auth-db-server to generate its own timestamp:
https://github.com/mozilla/fxa-auth-db-mysql/blob/master/db/mysql.js#L425
It's likely nothing bad will come of this, but we should probably be consistent. My vote would be to pass in all timestamps from the auth-server since that's what we're already doing in the majority of cases.
I woke up thinking about this PR comment:
Thanks for a keen eye
The thing is, a keen eye had nothing to do with it. The reason I spotted those oversights was because I specifically looked for them. And the reason I specifically looked for them was because I make the exact same oversights every single time I write a new migration. I don't think this is a coincidence.
The two oversights are:
The docs are really long. Using marks and split windows in vim makes them a bit more navigable but its still easy to lose track of where you are or where related sections are.
They also repeat themselves. Every section looks identical at a glance, so it's really easy to scroll past the end of the section you're supposed to be working on and start editing the wrong bit. I do this a lot.
The two documents for the server API and the database API express identical information for different layers. It's really easy to remember to update one of them but not the other. This is exacerbated by them living in different directories; it's too easy not to notice that you forgot to edit one.
Some ideas:
There is no automation around applying and testing the rollback migration, so it is really easy to leave copy-paste errors behind. Especially the patch level, but also to a lesser extent stored procedure versions.
The easiest way to create this migration is copy-pasting an old one to use as a template. Then you make your changes and use MySQL Workbench to make sure that it applies cleanly and the database seems correct with it applied. But you can't (easily) run the tests against it at this point because they and all of the code still expect the db to be in the pre-rollback state. So you tell yourself it looks okay and there's a code review that will double-check it for you.
Some ideas:
DROP
for every CREATE
etc and the correct patch level. It wouldn't be able to write a complete migration (unless we introduced some serious logic that went through all the migrations looking for earlier versions of stored procedures etc), but it would be enough of a start to avoid the common pitfalls.master
, then applies the rollback migration, then runs the tests in master
to make sure they all still work?Part of mozilla/fxa#45.
This work will actually be done in mozilla-services/puppet-config
, but I'm raising the issue here to give it visibility in our planning process.
Post- #85 mozilla/fxa-auth-server#1027, we'll need a new Heka filter to turn the account activity metrics into datapoints for the new retention graph(s).
I'm not certain that all of these are okay for retirement but a peek at my local database shows the following stored procedures have newer versions:
account_2
createAccount_2
createSessionToken_1
deleteAccount_4
emailRecord_2
passwordChangeToken_1
sessionToken_1
sessionToken_2
(Migrated to this repo from mozilla/fxa-auth-db-server#101)
As noted in mozilla/fxa-auth-server#785 (comment) it's generally a good idea, where possible, to have all your tranasctions touch tables in a consistent order. This reduces the possibility of deadlocks due to two threads waiting on a table lock held by the other.
It seems like a pretty small change that could reduce a small-but-definitely-there source of errors in production.
The MySQL driver succeeds whereas the memory driver fails with a 404 (not found).
mozlogify all the things.
Once we have mozilla/fxa-auth-server#983 and mozilla/fxa-auth-server#988, we'll be able to make a pretty accurate assessment of "number of users with multiple devices connected". I'm hoping that we can efficiently pull this out of the database on a daily basis and add it onto our dashboards.
We currently emit some daily metrics reports via the script here:
But I think that logic should live over here next to the code, and should probably grow some tests etc eventually. Perhaps we can do that move as part of this work.
Anyway, to the existing "count of all accounts" and "count of verified accounts", let's add "count of users with 2 or more devices" and "count of users with 3 or more devices".
We'll need to use the sessionTokens
table for this, and we should probably exclude sessions that haven't been active in the last month. So the thing we want is "number of uids that have more than X sessionTokens that were active in the last month".
There's an index on uid
so I'm hoping we can do it with an efficient SQL query, but my SQL-fu is not super strong. It'll be something like this totally-untested SQL:
SELECT COUNT(uid) FROM (
SELECT uid FROM sessionTokens
GROUP BY uid
HAVING COUNT(tokenId) > 1
)
But we'd have to check whether it can be run efficiently enough that we're happy doing it on our production db on a daily basis.
Part of mozilla/fxa#45.
This work will actually be done in mozilla/accounts-dashboard
, but I'm raising the issue here to give it visibility in our planning process.
When #86 is in place, we can implement the new retention graph plotting account activity at days 0, 1, 3, 7, 14, 30, 60 and 90.
Part of mozilla/fxa#45.
We need to emit events corresponding to activity on a given day for each account, to be processed by Heka as food for the retention graph.
Running npm test on master produces this chirp. I don't think the linter complexity tests have ever uncovered any real bugs, and mostly just whine about ||
defaulting. Can we remove the linting for "complexity"?
Running "eslint:files" (eslint) task
lib/db/mem.js
306:34 warning Function 'anonymous' has a complexity of 7 complexity
✖ 1 problem (0 errors, 1 warning)
I was just perusing the code here:
https://github.com/mozilla/fxa-auth-db-mysql/blob/master/db/mysql.js#L628
And AFAICT the getConnection
method ignores its "name" argument. Something doesn't seem right there. @chilts can you please sanity-check this?
From https://github.com/mozilla/fxa-auth-server/pull/876/files#r32576784, this would enable the /password/change/finish
endpoint to send a notification email with one fewer database hits.
During load tests, there was at least one episode where a series of timeouts happened on the fxa-auth servers, followed by a series of 500 errors. I'm not sure exactly what the root cause was but roughly something caused some PUT /sessionToken/ to block, and then the auth-server apparently retried these PUT requests. The retry of the PUT's resulted in 409 Conflict due to duplicate key when the first PUT was unblocked.
STR:
1 - Drive traffic into a test environment
2 - In mysql: lock tables sessionTokens write; select sleep(11); unlock tables;
Anyways, we should look into whether we should be disabling retires in poolee
, or using a retryFilter that allows known idempotent queries to retry but disallows others.
Noticed that the auth-db-mysql server on an fxa-dev is filling its logs with the exception below.
I believe that this is due to the change from p-promise to bluebird, and that this function should ensure that it always returns a promise on any path (events or no events), (P.resolve()
?).
Possibly unhandled TypeError: Cannot call method 'then' of undefined
at query.then.then.log.error.op (/data/fxa-auth-db-mysql/lib/db/mysql.js:734:20)
at tryCatch1 (/data/fxa-auth-db-mysql/node_modules/bluebird/js/main/util.js:43:21)
at Promise$_callHandler [as _callHandler] (/data/fxa-auth-db-mysql/node_modules/bluebird/js/main/promise.js:627:13)
at Promise$_settlePromiseFromHandler [as _settlePromiseFromHandler] (/data/fxa-auth-db-mysql/node_modules/bluebird/js/main/promise.js:641:18)
at Promise$_settlePromiseAt [as _settlePromiseAt] (/data/fxa-auth-db-mysql/node_modules/bluebird/js/main/promise.js:800:14)
at Promise$_settlePromises [as _settlePromises] (/data/fxa-auth-db-mysql/node_modules/bluebird/js/main/promise.js:934:14)
at Promise.b (domain.js:183:18)
at Async$_consumeFunctionBuffer [as _consumeFunctionBuffer] (/data/fxa-auth-db-mysql/node_modules/bluebird/js/main/async.js:75:12)
at Async$consumeFunctionBuffer (/data/fxa-auth-db-mysql/node_modules/bluebird/js/main/async.js:38:14)
at process._tickDomainCallback (node.js:492:13)
With Multi-AZ RDS Failover, from a client connection view, what will happen is that the current master database will shut down, terminating the TCP connection, then DNS will propagate a new IP for the (new) master db, and the database connection should reconnect.
In my tests in stage, under load, this only happened once in 4 tries.
I can provide further details, setup assistance as needed.
This should be a priority to fix to ensure minimal downtime.
(Filing this to capture the work as part of our planning process).
We're hoping to land some new metrics-gathering scripts in #72, but they're heavy queries that will make the auth db work hard. Let's try them out in stage and get a feel for how they might impact production traffic.
tried the following:
# Using Ubuntu
curl -sL https://deb.nodesource.com/setup_0.10 | sudo -E bash -
sudo apt-get install -y nodejs
afterwards I checked out the code
git clone https://github.com/mozilla/fxa-auth-db-mysql.git
cd fxa-auth-db-mysql/
fxa@vps229008:~/myGIT/fxa-auth-db-mysql$ npm install
npm WARN deprecated This version of npm lacks support for important features,
npm WARN deprecated such as scoped packages, offered by the primary npm
npm WARN deprecated registry. Consider upgrading to at least npm@2, if not the
npm WARN deprecated latest stable version. To upgrade to npm@2, run:
npm WARN deprecated
npm WARN deprecated npm -g install npm@latest-2
npm WARN deprecated
npm WARN deprecated To upgrade to the latest stable version, run:
npm WARN deprecated
npm WARN deprecated npm -g install npm@latest
npm WARN deprecated
npm WARN deprecated (Depending on how Node.js was installed on your system, you
npm WARN deprecated may need to prefix the preceding commands with `sudo`, or if
npm WARN deprecated on Windows, run them from an Administrator prompt.)
npm WARN deprecated
npm WARN deprecated If you're running the version of npm bundled with
npm WARN deprecated Node.js 0.10 LTS, be aware that the next version of 0.10 LTS
npm WARN deprecated will be bundled with a version of npm@2, which has some small
npm WARN deprecated backwards-incompatible changes made to `npm run-script` and
npm WARN deprecated semver behavior.
> [email protected] install /var/fxa/myGIT/fxa-auth-db-mysql/node_modules/restify/node_modules/dtrace-provider
> node scripts/install.js
npm WARN engine [email protected]: wanted: {"node":">=0.10.40","npm":">=2.0.0"} (current: {"node":"0.10.41","npm":"1.4.29"})
npm WARN engine [email protected]: wanted: {"node":">=0.12.0"} (current: {"node":"0.10.41","npm":"1.4.29"})
[email protected] node_modules/grunt-copyright
[email protected] node_modules/eslint-config-fxa
[email protected] node_modules/clone
[email protected] node_modules/proxyquire
├── [email protected]
└── [email protected] ([email protected], [email protected])
[email protected] node_modules/grunt-bump
└── [email protected]
[email protected] node_modules/bluebird
[email protected] node_modules/mozlog
├── [email protected]
└── [email protected] ([email protected], [email protected], [email protected], [email protected], [email protected], [email protected])
[email protected] node_modules/mysql
├── [email protected]
└── [email protected] ([email protected], [email protected], [email protected], [email protected])
[email protected] node_modules/sinon
├── [email protected]
├── [email protected]
├── [email protected]
└── [email protected] ([email protected])
[email protected] node_modules/mysql-patcher
├── [email protected]
├── [email protected]
├── [email protected]
├── [email protected] ([email protected], [email protected], [email protected], [email protected], [email protected])
└── [email protected]
[email protected] node_modules/request
├── [email protected]
├── [email protected]
├── [email protected]
├── [email protected]
├── [email protected]
├── [email protected]
├── [email protected]
├── [email protected]
├── [email protected]
├── [email protected]
├── [email protected] ([email protected])
├── [email protected]
├── [email protected] ([email protected])
├── [email protected] ([email protected])
├── [email protected] ([email protected], [email protected], [email protected])
├── [email protected] ([email protected])
└── [email protected] ([email protected], [email protected], [email protected], [email protected])
[email protected] node_modules/fxa-jwtool
├── [email protected]
├── [email protected] ([email protected])
└── [email protected] ([email protected])
[email protected] node_modules/load-grunt-tasks
├── [email protected] ([email protected], [email protected], [email protected])
└── [email protected] ([email protected], [email protected])
[email protected] node_modules/restify
├── [email protected]
├── [email protected]
├── [email protected]
├── [email protected]
├── [email protected]
├── [email protected]
├── [email protected]
├── [email protected]
├── [email protected]
├── [email protected] ([email protected])
├── [email protected]
├── [email protected]
├── [email protected] ([email protected])
├── [email protected]
├── [email protected]
├── [email protected] ([email protected])
├── [email protected] ([email protected], [email protected])
├── [email protected] ([email protected], [email protected], [email protected], [email protected])
├── [email protected] ([email protected])
└── [email protected] ([email protected], [email protected])
[email protected] node_modules/nock
├── [email protected]
├── [email protected] ([email protected])
├── [email protected] ([email protected])
├── [email protected] ([email protected], [email protected])
└── [email protected]
[email protected] node_modules/grunt
├── [email protected]
├── [email protected]
├── [email protected]
├── [email protected]
├── [email protected]
├── [email protected]
├── [email protected]
├── [email protected]
├── [email protected]
├── [email protected]
├── [email protected] ([email protected])
├── [email protected] ([email protected], [email protected])
├── [email protected]
├── [email protected] ([email protected], [email protected])
├── [email protected]
├── [email protected]
├── [email protected]
├── [email protected] ([email protected], [email protected])
├── [email protected] ([email protected], [email protected])
└── [email protected] ([email protected], [email protected], [email protected])
[email protected] node_modules/convict
├── [email protected]
├── [email protected]
├── [email protected] ([email protected], [email protected])
├── [email protected] ([email protected], [email protected])
├── [email protected]
└── [email protected] ([email protected])
[email protected] node_modules/ass
├── [email protected]
├── [email protected] ([email protected])
├── [email protected] ([email protected], [email protected], [email protected], [email protected])
└── [email protected] ([email protected], [email protected], [email protected])
[email protected] node_modules/grunt-nsp
└── [email protected] ([email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected])
[email protected] node_modules/tap
├── [email protected]
├── [email protected]
├── [email protected]
├── [email protected]
├── [email protected]
├── [email protected]
├── [email protected] ([email protected], [email protected])
├── [email protected] ([email protected])
├── [email protected] ([email protected])
├── [email protected] ([email protected], [email protected], [email protected], [email protected], [email protected], [email protected])
├── [email protected] ([email protected], [email protected], [email protected], [email protected], [email protected])
├── [email protected] ([email protected], [email protected])
├── [email protected] ([email protected], [email protected], [email protected], [email protected], [email protected])
├── [email protected] ([email protected], [email protected], [email protected], [email protected], [email protected])
├── [email protected] ([email protected], [email protected], [email protected])
└── [email protected] ([email protected], [email protected], [email protected], [email protected], [email protected])
[email protected] node_modules/grunt-eslint
├── [email protected] ([email protected], [email protected], [email protected], [email protected], [email protected])
└── [email protected] ([email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected])
When you depend on nodejs 0.10, why do I run into errors.
Can you please document dependencies.
Thank you very much
ref #147 (comment)
Following up mozilla/fxa-auth-db-server#88,
we don't need to return the verifyHash field since the password verification is done in the db server now.
I identified 2 API endpoints that returns verifyHash : GET /emailRecord and GET /account.
DB support for device API: https://gist.github.com/dannycoates/44b1965a6be42060e15f
Let's provide an API for listing all session tokens active on an account. I'm thinking as simple as:
GET /account/:id/sessions
=> [ {session data}, {session data}, ...]
Most immediately, we'll want to be able to determine whether a user is single-device or multi-device, which can be accomplished by counting their session tokens. We'll eventually want to display session tokens and their metadata on a control dashboard.
One thing to be careful of: we probably dont want to return the token's secret data in this query, to guard against accidental leakage. You should only get at that data if you look up a session directly by its id.
As seen in the following (infra private, sorry!) bug:
https://bugzilla.mozilla.org/show_bug.cgi?id=1138660#c2
We are creating new stored procedures with each db migration, but we don't ever clean up the old ones. We now have procedures forgotPasswordVerified_1
, forgotPasswordVerified_2
and forgotPasswordVerified_3
in the db.
We should come up with a scheme for cleaning up old versions once we're confident they'll no longer be needed.
Removed in auth-server: mozilla/fxa-auth-server#1337 (comment)
Let's try to delete the endpoints and maybe the db stuff.
Sometimes /account/reset
starts failing for no apparent reason. It seems to fail more often on @shane-tomlinson's machine than on mine. Once it starts to fail, it continues doing so until the db server is restarted. Re-starting the auth server at that point has no effect.
This has been happening for a long time, but may have become more frequent recently. We've only seen it happen when running against the memory backend, although that doesn't necessarily mean the problem is there.
In the auth server logs, the uid
in the failing request payload is always a null byte / 00
. In the db server logs, there is a SyntaxError: Unexpected token ILLEGAL
.
auth server log:
db server log:
db server syntax error:
➜ fxa-auth-db-mysql git:(master) npm test
> [email protected] test /Users/stomlinson/development/fxa-auth-db-mysql
> grunt && node ./bin/db_patcher.js >/dev/null && ./scripts/tap-coverage.js test/backend test/local
Running "jshint:files" (jshint) task
>> 23 files lint free.
Running "copyright:app" (copyright) task
Running "copyright:tests" (copyright) task
Done, without errors.
{"name":"db-patcher","hostname":"Shanes-Retina-MBP.local","pid":81159,"level":30,"msg":"Database patched to level 7","time":"2015-02-04T17:05:57.347Z","v":0}
ok test/backend/db_tests.js ......................... 135/135
404 '/account/hello-world'
404 '/emailRecord/3639623637316162613462663936643435396238393936393935373631646165406578616d706c652e636f6d'
404 '/sessionToken/0df4ec7f3dbea05cf365ac9b9f62b645cb52ed630c452b73a48797ad08585e6c'
404 '/sessionToken/0df4ec7f3dbea05cf365ac9b9f62b645cb52ed630c452b73a48797ad08585e6c'
404 '/keyFetchToken/3bde20c4ac621055d0e37c1e798fa4b1395d35a1b1aadd55155f37520cedc63a'
404 '/keyFetchToken/3bde20c4ac621055d0e37c1e798fa4b1395d35a1b1aadd55155f37520cedc63a'
404 '/accountResetToken/0ec2d9f2c6f9807d600fb49bd85940828b6b98abf3409d096ee71a23db3e9b73'
404 '/accountResetToken/0ec2d9f2c6f9807d600fb49bd85940828b6b98abf3409d096ee71a23db3e9b73'
404 '/passwordChangeToken/e4128f89d8f515f17ddbfb02cb803baedf282951a6bbe5e50aae8ea33849f528'
404 '/passwordChangeToken/e4128f89d8f515f17ddbfb02cb803baedf282951a6bbe5e50aae8ea33849f528'
404 '/passwordForgotToken/42985d3970a2a9608d2eb90b07b2632a71d151659c84a0d154344bb365b82e90'
404 '/passwordForgotToken/42985d3970a2a9608d2eb90b07b2632a71d151659c84a0d154344bb365b82e90'
404 '/passwordForgotToken/b080f8dcf51095cafad246807469f47e33b488506c2b899c5e47f7b6cf47ed51'
ok test/backend/remote.js ........................... 128/128
ok test/local/incorrect-patch-level.js .................. 4/4
{"name":"db-api","hostname":"Shanes-Retina-MBP.local","pid":81165,"level":30,"op":"MySql.connect","patchLevel":7,"patchLevelRequired":7,"msg":"","time":"2015-02-04T17:05:59.487Z","v":0}
ok test/local/log-stats.js .............................. 5/5
not ok test/local/mysql_tests.js ...................... 32/34
Command: "/Users/stomlinson/development/nvm/v0.10.25/bin/node mysql_tests.js"
TAP version 13
ok 1 Got the ping ok
ok 2 we have an error
ok 3 should be equal
ok 4 should be equal
ok 5 should be equal
ok 6 should be equal
ok 7 we have an error
ok 8 should be equal
ok 9 should be equal
ok 10 should be equal
ok 11 should be equal
ok 12 we have an error
ok 13 should be equal
ok 14 should be equal
ok 15 should be equal
ok 16 should be equal
ok 17 we got an error
ok 18 should be equal
ok 19 should be equal
ok 20 should be equal
ok 21 should be equal
ok 22 we got an error
ok 23 should be equal
ok 24 should be equal
ok 25 should be equal
ok 26 should be equal
ok 27 the function was retried
ok 28 Drop procedure was successful
ok 29 The stored procedure creation was successful
ok 30 The call to the stored procedure failed as expected
not ok 31 error stringified is correct
---
file: /Users/stomlinson/development/fxa-auth-db-mysql/node_modules/bluebird/js/main/util.js
line: 43
column: 21
stack:
- |
getCaller (/Users/stomlinson/development/fxa-auth-db-mysql/node_modules/tap/lib/tap-assert.js:418:17)
- |
assert (/Users/stomlinson/development/fxa-auth-db-mysql/node_modules/tap/lib/tap-assert.js:21:16)
- |
Function.equal (/Users/stomlinson/development/fxa-auth-db-mysql/node_modules/tap/lib/tap-assert.js:162:10)
- |
Test._testAssert [as equal] (/Users/stomlinson/development/fxa-auth-db-mysql/node_modules/tap/lib/tap-test.js:87:16)
- |
/Users/stomlinson/development/fxa-auth-db-mysql/test/local/mysql_tests.js:204:17
- |
tryCatch1 (/Users/stomlinson/development/fxa-auth-db-mysql/node_modules/bluebird/js/main/util.js:43:21)
- |
Promise$_callHandler [as _callHandler] (/Users/stomlinson/development/fxa-auth-db-mysql/node_modules/bluebird/js/main/promise.js:627:13)
- |
Promise$_settlePromiseFromHandler [as _settlePromiseFromHandler] (/Users/stomlinson/development/fxa-auth-db-mysql/node_modules/bluebird/js/main/promise.js:641:18)
- |
Promise$_settlePromiseAt [as _settlePromiseAt] (/Users/stomlinson/development/fxa-auth-db-mysql/node_modules/bluebird/js/main/promise.js:800:14)
- |
Promise$_settlePromises [as _settlePromises] (/Users/stomlinson/development/fxa-auth-db-mysql/node_modules/bluebird/js/main/promise.js:934:14)
found: |
Error: ER_NO_DEFAULT_FOR_FIELD
wanted: |
Error: ER_BAD_NULL_ERROR
diff: |
FOUND: Error: ER_NO_DEFAULT_FOR_FIELD
WANTED: Error: ER_BAD_NULL_ERROR
^ (at position = 10)
...
ok 32 error code is correct
not ok 33 error errno is correct
---
file: /Users/stomlinson/development/fxa-auth-db-mysql/node_modules/bluebird/js/main/util.js
line: 43
column: 21
stack:
- |
getCaller (/Users/stomlinson/development/fxa-auth-db-mysql/node_modules/tap/lib/tap-assert.js:418:17)
- |
assert (/Users/stomlinson/development/fxa-auth-db-mysql/node_modules/tap/lib/tap-assert.js:21:16)
- |
Function.equal (/Users/stomlinson/development/fxa-auth-db-mysql/node_modules/tap/lib/tap-assert.js:162:10)
- |
Test._testAssert [as equal] (/Users/stomlinson/development/fxa-auth-db-mysql/node_modules/tap/lib/tap-test.js:87:16)
- |
/Users/stomlinson/development/fxa-auth-db-mysql/test/local/mysql_tests.js:206:17
- |
tryCatch1 (/Users/stomlinson/development/fxa-auth-db-mysql/node_modules/bluebird/js/main/util.js:43:21)
- |
Promise$_callHandler [as _callHandler] (/Users/stomlinson/development/fxa-auth-db-mysql/node_modules/bluebird/js/main/promise.js:627:13)
- |
Promise$_settlePromiseFromHandler [as _settlePromiseFromHandler] (/Users/stomlinson/development/fxa-auth-db-mysql/node_modules/bluebird/js/main/promise.js:641:18)
- |
Promise$_settlePromiseAt [as _settlePromiseAt] (/Users/stomlinson/development/fxa-auth-db-mysql/node_modules/bluebird/js/main/promise.js:800:14)
- |
Promise$_settlePromises [as _settlePromises] (/Users/stomlinson/development/fxa-auth-db-mysql/node_modules/bluebird/js/main/promise.js:934:14)
found: 1364
wanted: 1048
...
ok 34 test/local/mysql_tests.js
1..34
# tests 34
# pass 32
# fail 2
ok test/local/prune_tokens.js ......................... 11/11
total ............................................... 315/317
not ok
code coverage: 92.7%
generating coverage.html: complete in 0.4s
npm ERR! Test failed. See above for more details.
npm ERR! not ok code 0
➜ fxa-auth-db-mysql git:(master)
The sha is 1fcf15c
I'm sure this is operator error, but I haven't figured out where I am going wrong.
The lazy way to file a bug report:
[03-16 17:31:03] jrgm I was looking at logging in fxa-auth-db-mysql.
[03-16 17:31:18] jrgm Some log lines are like this - https://github.com/mozilla/fxa-local-dev
[03-16 17:31:20] jrgm oops
[03-16 17:31:33] jrgm Some log lines are like this - https://github.com/mozilla/fxa-auth-db-mysql/blob/master/lib/db/mysql.js#L119-L122
[03-16 17:31:55] jrgm And others like this (mostly on error paths) - https://github.com/mozilla/fxa-auth-db-mysql/blob/master/lib/db/mysql.js#L774
[03-16 17:32:24] jrgm seanmonstar: the former style is correct, and the latter, well, borked, right?
[03-16 17:32:56] seanmonstar The latter is what was used prior to mozlog
[03-16 17:33:24] jrgm yeah, I see in history where some, but not all, were converted.
[03-16 17:33:34] seanmonstar The first argument should be the String op
[03-16 17:35:06] jrgm Or else you get `"Type":"bin.server.[object Object]"`
[03-16 17:37:52] seanmonstar To catch that in tests, the mozlog config accepts a 'debug: true' param
[03-16 17:38:04] seanmonstar That will assert a few things about the logs
re: https://bugzilla.mozilla.org/show_bug.cgi?id=1220397
There is a significant leak in train-48 in the auth-db-server, which required us to rollback to train-47 version of auth-db/auth.
I don't see a leak in the actual changes in train-48. I set up some some branches where I undid the module version updates, then selectively updated each separately and ran some load tests. There is noise in these numbers, but the update to bluebird (and how we use it?) looks to be a good candidate as to what is doing (most of?) the leaking.
RSS end Pct. change RSS post Pct. change
of test forced GC
48-no-module-updates-#1 (10min) 89412 1.0% 74188 6.6%
48-no-module-updates-#2 (10min) 89780 1.5% 68896 -1.0%
48-no-module-updates-#3 (10min) 86268 -2.5% 65756 -5.5%
48-no-module-updates-average 88487 0.0% 69613 0.0%
train-46 (10min) 89110 0.7% 71564 2.8%
48 update clone,convict,mozlog (10min) 89232 0.8% 68728 -1.3%
48 update only bluebird (10min) 93520 5.7% 74820 7.5%
48 update only fxa-jwtool (10min) 90984 2.8% 71076 2.1%
48 update only mysql (10min) 96404 8.9% 71124 2.2%
48 update only request (10min) 91404 3.3% 68968 -0.9%
48 update only restify (10min) 91760 3.7% 68804 -1.2%
48 update only mysql (1hr) 93360 5.5% 72516 4.2%
48 update all except mysql (1hr) 111764 26.3% 87728 26.0%
48 update only bluebird (1hr) 112016 26.6% 87256 25.3%
48 update all except bluebird (1hr) 96792 9.4% 72652 4.4%
Not sure how to proceed further in isolating the root cause of the leak. @dannycoates?
This is a follow-up to #56. There's leftover top-level-repo cruft in the fxa-auth-db-server repo, such as .gitignore files and travis config nd grunt tasks. We should move them out to the top level where possible, or delete them if not.
This is a follow-up to #56
It's got a few cleanups and improved error handling.
Back in #74, I did some jiggling about to get the base directory stuff from the old fxa-auth-db-server
repo into the base directory of this repo. One of those changes was to have the two changelogs side-by-side in CHANGELOG-db.md
and CHANGELOG-server.md
.
This broke our version script, which hasn't been updating the change logs since then. We should reinstate CHANGELOG.md
so that the script works correctly again.
Bonus points for:
Over in mozilla/fxa-auth-server#1384, we're removing createAccountResetToken
because it is never called, forgotPasswordVerified
is instead.
Both those methods map to separate endpoints and stored procedures in this repo. Should we remove the PUT /accountResetToken/:id
endpoint and the createAccountResetToken
stored procedure on the basis that the auth server never calls them?
The requirement for this comes from mozilla/fxa#45, which describes a user retention curve with points plotted at days 0, 1, 3, 7, 14, 30, 60 and 90.
Day 0 will always be 100% and, in our case, represents all accounts that have a lastAccessTime
set in the sessionTokens
table. Statistically, we can't make comparisons against accounts without a qualifying session because we don't know how long those users were retained. Thus the graph will end up being populated gradually until it is fully available, 90 days after the deployment of mozilla/fxa-auth-server#983.
The subsequent days represent the percentage of those users who were still active at that point and can be measured by comparing lastAccessTime
to createdAt
from the accounts
table.
Concretely, this is the kind of thing I'm thinking of in SQL terms (although there may be a more performant way to do it):
SELECT
COUNT(s0.lastAccessTime) AS d0,
COUNT(s1.lastAccessTime) AS d1,
COUNT(s3.lastAccessTime) AS d3,
COUNT(s7.lastAccessTime) AS d7,
COUNT(s14.lastAccessTime) AS d14,
COUNT(s30.lastAccessTime) AS d30,
COUNT(s60.lastAccessTime) AS d60,
COUNT(s90.lastAccessTime) AS d90
FROM accounts AS a
LEFT JOIN sessionTokens AS s0 ON a.uid = s0.uid
LEFT JOIN sessionTokens AS s1 ON a.uid = s1.uid
LEFT JOIN sessionTokens AS s3 ON a.uid = s3.uid
LEFT JOIN sessionTokens AS s7 ON a.uid = s7.uid
LEFT JOIN sessionTokens AS s14 ON a.uid = s14.uid
LEFT JOIN sessionTokens AS s30 ON a.uid = s30.uid
LEFT JOIN sessionTokens AS s60 ON a.uid = s60.uid
LEFT JOIN sessionTokens AS s90 ON a.uid = s90.uid
WHERE
s0.lastAccessTime > 0 AND
s1.lastAccessTime >= a.createdAt + 86400000 AND
s3.lastAccessTime >= a.createdAt + 259200000 AND
s7.lastAccessTime >= a.createdAt + 604800000 AND
s14.lastAccessTime >= a.createdAt + 1209600000 AND
s30.lastAccessTime >= a.createdAt + 2592000000 AND
s60.lastAccessTime >= a.createdAt + 5184000000 AND
s90.lastAccessTime >= a.createdAt + 7776000000;
@ckarlof, mentioning you in case you want to speak about this on Thursday.
With signin confirmation and email caphta we have more targeted
meant to keep bad people at bay while minimizing friction for valid
users, account lockout no longer seems particularly elegant.
Ref mozilla/fxa-content-server#3949
Ref mozilla/fxa-auth-server#1359
Arising from the discussion at mozilla/fxa-auth-server#1374 (comment), it has become clear that the binary state modelled by unverifiedTokens
is insufficient. Specifically, there is a class of tokens for which we would like to say "this token hasn't been verified and doesn't need to be".
@rfk suggested we could use a NULL
in the tokenVerificationId
column to imply the state. Alternatively, if we didn't fancy implying the state in that way, a migration to add an extra column should be quite quick to execute because records are expunged from that table as tokens get verified.
Might we consider deleting the fxa-auth-db-server
repo, renaming this repo to fxa-auth-db-server
and re-jiggling the directory structure / dependencies to eliminate the fxa-auth-db-server
sub-directory?
My thinking is that the mysqlness of this repo is both an implementation detail and not true when testing against db-mem
.
There's also a weird (imho) back-to-frontness about the server bit being a child of the actual database bit. The server is the outwardly-visible interface that everything else cares about, why should it live tucked away as a child of the bit that nobody interacts with directly?
And all of our other services are named fxa-xxx-server
so this one is going to seem pretty odd to people not familiar with the history.
Not to mention the fact that Shane's amazing high-level architecture diagram is now WRONG! 🙀
I guess an argument could be made that it's just a name and names don't matter, but I kind of think it does in this case.
This work will actually be done in mozilla-services/puppet-config
, but I'm raising the issue here to give it visibility in our planning process.
Now that the new metrics script is in master
, we need to update the puppet manifest to clone this repo and run the new script and then update the Heka filter to process the new metrics.
Now that the three old database repos have been merged into one, the readme files and documentation are wrong.
Just trying to capture a few options for the patcher and production to see if we can somehow automate this in a particular way but being safe about it as well.
mysql-patcher
in ProdThe current mysql-patcher
sends the entire patch file to mysql in one go, for it to execute every statement. Without some overseeing, some MySql statements (such as adding a column or an index) may not complete correctly. Since MySql doesn't have transactional DDL, this may have failed in the middle of a patch file and therefore would need manual inspection to check which statements have or haven't been applied correctly. Whether the patch level in dbMetadata
is or isn't updated correctly the database then differs from any patch.
mysql-patcher
to run each individual statement in each patch fileThis may work. It introduces some overhead to delimit the statements in the patch files so that the mysql-patcher
can send each statement to MySql separately. The patcher program could then keep re-trying (up to a certain limit) to make sure each statement is executed in turn. If any statement fails up to a maximum of times, it could then try to reverse the patch number. The ability to log more info related to which statement is being applied, the length of time it took and what failed would make this process more transparent. This would still require at least manual oversight to make sure nothing is going awry.
By having each statement in code means we wouldn't need to instrument any patch files with delimiters since they would already be separate. Again, each statement could be performed in turn with some retry logic and perhaps a patch reversal process if anything in the patch fails too many times. Again, manual oversight would also be required though the extra logging information would also help make sure it went smoothly.
@jrgm: Do either of the last two solutions appeal to you, or anything else? Have you got any other suggestions? /cc @rfk @ckolos. Thanks.
From email conversation with @jrgm:
"""
I was testing RDS Multi-AZ failover in stage under load on Friday.
The TL;DR is that fxa-auth does not reconnect on failover. Ugh.
fxa-oauth and fxa-profile did reconnect, and on a restart fxa-auth
reconnects. But on 3 of 4 occasions, fxa-auth wound up in a state where it
kept putting queries in the queue and returning 500 on 'Error: Queue limit
reached'.
"""
We do some mucking around with the error-recovery options, which may be affecting this.
From https://github.com/mozilla/fxa-auth-db-mysql/blob/master/lib/db/mysql.js#L34:
// poolCluster will remove the pool after `removeNodeErrorCount` errors.
// We don't ever want to remove a pool because we only have one pool
// for writing and reading each. Connection errors are mostly out of our
// control for automatic recovery so monitoring of 503s is critical.
// Since `removeNodeErrorCount` is Infinity `canRetry` must be false
// to prevent inifinite retry attempts.
this.poolCluster = mysql.createPoolCluster(
{
removeNodeErrorCount: Infinity,
canRetry: false
}
)
This work will actually take place in mozilla/accounts-dashboard
but I'm raising it here to give it visibility in our planning process.
We need to create new charts for the device metrics that we will soon be recording. To limit intrusiveness on other views, they should probably be grouped together into a new sub-dashboard for FxA.
/cc @eoger @jrgm; from #133 (comment)
In #133 we added a database migration that changes the column type for push callback pubkeys. @jrgm notes that it can't be deployed in production due to the change on the column type causing errors. We need to figure out a way forward for this before cutting train-62.
I'm hopeful that we can get around it by deploying two migrations - one that drops the pubkey stuff entirely and updates stored procedures to not reference that column at all, and a second that adds the pubkey stuff back in in the new format.
@jrgm does this sound like a reasonable approach to the problems you saw?
Comments like mozilla/fxa-auth-server#954 and #51 (comment) make me sad. The current factoring of these auth-db-* repos seems to be creating a lot of busywork and deployment footguns. Let's find a better way to do this.
I personally lean towards "squash them all back together into a single repo" like the way oauth-server does its backends, but am open to other suggestions :-)
STR
...
ok 177 account no longer exists for this email address
ok 178 unlockCode is deleted for this uid
ok 179 Returned an empty object for unlockAccount
not ok 180 Record already exists
...
which is https://github.com/mozilla/fxa-auth-db-mysql/blob/master/fxa-auth-db-server/test/backend/db_tests.js#L909-L937, presumably the constant value of the openid.
At about 2015-02-11T13:36Z, across all production fxa-auth-db servers, there were hundreds of 500 errors with this stack. I don't really have more information at this time. Things we need to look at are stackdriver/cloudwatch mysql stats; show engine innodb status
on prod.
Error: Queue limit reached.
at Pool.getConnection (/data/fxa-auth-db-server/node_modules/mysql/lib/Pool.js:64:15)
at PoolCluster._getConnection (/data/fxa-auth-db-server/node_modules/mysql/lib/PoolCluster.js:148:13)
at PoolNamespace.getConnection (/data/fxa-auth-db-server/node_modules/mysql/lib/PoolCluster.js:188:17)
at PoolCluster.getConnection (/data/fxa-auth-db-server/node_modules/mysql/lib/PoolCluster.js:80:13)
at promisified (eval at makeNodePromisifiedEval (/data/fxa-auth-db-server/node_modules/bluebird/js/main/promisify.js:195:12), :2:343)
at retryable (/data/fxa-auth-db-server/db/mysql.js:636:12)
at MySql.getConnection (/data/fxa-auth-db-server/db/mysql.js:605:12)
at MySql.singleQuery (/data/fxa-auth-db-server/db/mysql.js:490:17)
at MySql.read (/data/fxa-auth-db-server/db/mysql.js:575:17)
at MySql.readFirstResult (/data/fxa-auth-db-server/db/mysql.js:556:17)
Stupid error, this.
The metrics script appends JSON-serialised data to a file for later processing by the Heka filter, but it doesn't delimit between writes. So the file contains valid JSON for the first write and unparseable nonsense subsequently.
Instead it should either:
1
is obviously more efficient but it also requires changes to the Heka filter, which I'm not entirely comfortable making yet. I'm going to look into that, but may end up going with 2
if it seems hairy.
I noticed that a Kibana graph of auth-db errors (that only runs in production, not stage) had stopped updating with the release. I chatted with @whd and it was because the log lines were missing an HTTP 'code' value. After much digging and head scratching, the root cause is that the npm-shrinkwrap.json
file in fxa-auth-db-mysql
is specifically pinned to a git SHA in https://github.com/mozilla/fxa-auth-db-mysql/blob/train-31/npm-shrinkwrap.json#L286-L290. This obviously is pretty disturbing, and we can talk later about how to prevent this from occuring (i.e., tools to validate that package.json, node_modules and npm-shrinkwrap.json are in sync, and a grunt task that will consistently maintain npm-shrinkwrap.json
).
However, the "good" news is that the only actual code change that was dropped was this change, which is why the log line in production was missing.
I considered asking @ckolos to rollback to train-30 on auth-server, but have decided that the missing log line is "liveable". I think we should pick up the L10N strings and a fix for train-31 npm-shrinkwrap to get back to where we should be, and I'll work on that now.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.