lukasvermeer / srm Goto Github PK

This Chrome Extension automatically performs SRM checks and flags potential data quality issues on supported experimentation platforms.

Home Page: https://lukasvermeer.nl/srm/

License: MIT License

JavaScript 97.63% HTML 2.00% Ruby 0.37%

chrome-extension google-optimize sample-ratio-mismatch srm statistics statistical-analysis data-quality ab-testing experimentation vwo

srm's People

Contributors

Stargazers

Watchers

Forkers

heinrich333 kingo55 geoprofi mysticaltech pauldria afabijan seekshreyas vijaye-statsig stevencasey fooku18

srm's Issues

Optimize: "Updated Reporting" breaks the Extension.

We should fix that. Also take into account that clicking the button does not change the url, so doesn't rerun the check. 👎

Ad LaunchDarkly to platforms which support SRM checks

Documentation.

Convert: auto-stopping variants triggers false SRM alert

Seems there is no way to determine whether a variant was stopped early. This causes false alarm. @geoprofi is discussing next steps with Convert team.

Create logo

Needs to work for

Extension thumbnail in header
Website favicon
Website artwork (optional)

Cater to edge case where people set winning variant to 100%

@geoprofi points out a common edge case scenario which will lead to false SRM warnings:

A significant issue I see is that many people would just leave an Optimize experiment running with 100% of traffic going to the winning variant while the data continues to gather. This will, naturally, very quickly lead to SRM warnings for these tests.

The extension should simply refuse to do the check if this is the case.

Show analysis summary in extension popup

Should probably show:

Explanation
Values used
p-value
(more?)

Figure out how to serve the Jekyll site locally for testing

I've never actually tried to build the site locally. It uses Github Pages and a remote theme, so not sure how to get that working locally. Should be possible. Doesn't seem urgent. Maybe we should create an issue?

Originally posted by @lukasvermeer in #34 (comment)

VWO: Red highlighting disappears when selecting different goal

Will listen to clicks on goal tabs & reapply if SRM = true.

Extend microsite to support A/B/n testing.

As suggested by @geoprofi.

Exit survey after uninstall

We could ask users why they uninstalled the extension. This might help us identify issues or bugs.

https://developer.chrome.com/extensions/runtime#method-setUninstallURL

Add "deep dive" feature to help users identify likely root cause

I've been reluctant to let go of identifying problematic variants for too long. @geoprofi has convinced me. We should just check for SRM overall (see #16).

However, in addition, we could still try to provide an "auto deep dive" feature which tries to find likely causes (we could base it on the taxonomy maybe), but that would be a different thing from the simple check functionality.

Add support for Google Optimize

Reduce JS errors by running interval only when necessary

Might send people that are using JS error reporting on a hunt that isn't necessary

Add AB Tasty to list of platforms which support checks

Ronny suggested they support it. I'll need to find a screenshot.

Allow users to change the p-value threshold

This will be trivial technically, but will require some thought as to reasonable defaults and how we explain this to users.

Submit to Chrome Web Store

Adding tools with unknown domain / url format

Hello,

Great extension! I would like to add SiteSpect to the list of tools, but I'm running into an issue with the configuration.
Currently, the tools are recognized by their domain names.

But for SiteSpect, many customers run on-premise, with custom (sub)domains.

Would it be possible to add a feature like this:
When clicking on the extension icon, in the dropdown we could create a way to 'force' an SRM check of a specific tool, on the current page.

In this case, it won't do it automatically, but there is a possibility to verify the SRM. Also after manually triggering, for that user we could store the current domain as being that "tool" so it could automatically try to do it in the future.

Would like to hear your thoughts about this.

Kind regards,
Jonas

Unit tests for stats

@geoprofi suggested:

This tool made by me has been fairly rigorously tested and so any input there should return the correct output: https://www.gigacalculator.com/calculators/chi-square-calculator.php and so tables can be built around it. Similarly the chisq.test() function in R can be used, needs just to inputs, e.g. x <- c(1000,950,1050,1000); probs <- c(0.25,0.25,0.25,0.25); then chisq.test(x=x,p=probs);

https://www.gigacalculator.com/calculators/chi-square-calculator.php?test=goodnessoffit&data=1000+0.25%0D%0A950+0.25%0D%0A1050+0.25%0D%0A1000+0.25

x <- c(1000,950,1050,1000); probs <- c(0.25,0.25,0.25,0.25); chisq <- chisq.test(x=x,p=probs); chisq;

Microsite should show exact p-value

When value is lower than threshold it should still show the value. Now it shows threshold.

Fix how we check SRM in case of multiple variants

As discussed with @geoprofi we should:

check for SRM on the whole contingency table in one test, then
flag SRM for the whole experiment, then
let the user figure out manually which variant, if any single one, is the culprit

We should do some SEO

Obviously we should be the go-to resource for anyone searching for "SRM" or "Sample Ratio Mismatch". Seems Google doesn't realise that yet. We should help them.

Move iframe function out of platform sections

Chances are we'll need it for other platforms too.

Change extension icon based on check results

Rather than changing the page background to red ...

Use a one-proportion z-test instead of a chi-square

@keesmulder suggested we should use a one-proportion z-test instead of a chi-square.

For example:

from scipy import stats
from statsmodels.stats.proportion import proportions_ztest

stats.chisquare([502,498],[500,500])[1]
proportions_ztest(count=502, nobs=1000, value=0.5)[1]
stats.chisquare([502,498],[500,500])[1] - proportions_ztest(count=502, nobs=1000, value=0.5)[1]

stats.chisquare([600,400],[500,500])[1]
proportions_ztest(count=400, nobs=1000, value=0.5)[1]
stats.chisquare([600,400],[500,500])[1] -proportions_ztest(count=400, nobs=1000, value=0.5)[1]

Which yields:

python
Python 2.7.16 (default, Oct 10 2019, 22:02:15) 
[GCC 8.3.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from scipy import stats
>>> from statsmodels.stats.proportion import proportions_ztest
>>> 
>>> stats.chisquare([502,498],[500,500])[1]
0.8993431885613663
>>> proportions_ztest(count=502, nobs=1000, value=0.5)[1]
0.8993423875828498
>>> stats.chisquare([502,498],[500,500])[1] - proportions_ztest(count=502, nobs=1000, value=0.5)[1]
8.009785165130623e-07
>>> 
>>> stats.chisquare([600,400],[500,500])[1]
2.5396285894708634e-10
>>> proportions_ztest(count=400, nobs=1000, value=0.5)[1]
1.082387390934913e-10
>>> stats.chisquare([600,400],[500,500])[1] -proportions_ztest(count=400, nobs=1000, value=0.5)[1]
1.4572411985359504e-10

At first glance, this seems to be more sensitive.

Maybe @geoprofi has an opinion here too?

Is doing pairwise SRM tests ok?

Hello @lukasvermeer, first of all, thank you so much for your work on this! I have a quick question that pertains to calculating SRM. Would it be valid to look at control-variant pairs on top of the standard 'global' SRM test?

For example, if we had 4 variants including control, we would have 3 pairs:

SRM global - control - variant 1 - variant 2 - variant 3 (standard way)
SRM test, control - variant 1
SRM test, control - variant 2
SRM test, control - variant 3

Of course in the case that the control itself has a problem, all secondary tests would trigger, but if any of variants 1, 2, or 3 have a problem, this would enable us to know exactly which one.

From a naive analysis of the chi-square goodness of fit test, it would work out on the math level. But would it really be valid statistically speaking? 🙏