hammerlab / cycledash Goto Github PK
View Code? Open in Web Editor NEWVariant Caller Analysis Dashboard and Data Management System
License: Other
Variant Caller Analysis Dashboard and Data Management System
License: Other
When we call a lot of variants it would be nice to give some guidance to the user as to what variants should be inspected
It's not clear that work is being done, particularly for large VCFs.
Current way works only if we have a truth VCF—we can filter by tabs in the true/false pos/neg table. We should remove the tabs, and add a different method of filtering by variants so that this works without a truth VCF as well.
All/SNV/INDEL/SV
This way, we're not restricted to parsing out the DP attribute of all the records info fields. Could be useful to similarly analyze and visualize quality scores etc.
A string like "Showing 1,234/5,678 variants" would be helpful for getting a sense of how much data there is and how narrow the filters are.
This may be tough wrt performance until the infinite scroll table is done #28
Right now, it re-renders with every character typed (which is nuts). Should wait until the input loses focus or "enter" is pressed to run the filter code and rerender.
This may be best accomplished by adding 4 tables to the T/P/F/N Table, "All", "SNV", "INDEL", "SV", and filter the table accordingly.
This may cause interference with vcf.tools, used in VCFTable.js
In order to speed up page rendering, the VCFTable should show n (=100) rows, and recycle DOM elements when scrolling, instead of keeping a DOM el for all displayed rows/records at all times. ''This should probably be in its own repo, as something like this may be widely useful.
Required for #27, probably.
When you load the page, the browser has to request each VCF file.
When you click a variant to open BioDalliance, it requests the exact same VCF files again. It would be more efficient to save a copy of each VCF file and re-use it.
They should be in a different color, a bar of its own next to each corresponding bin in the regular VCF in each AttributeChart (if a truth VCF exists)
Right now the base view allows you see different runs of a caller, but for any dataset. Ideally we would first fix the dataset and then see multiple runs of the caller on that dataset and also across many different callers.
Should capture the failure of a worker (e.g. concordance was unable to run for some reason; we believe it could now) and allow the job to be restarted without hackery.
In both vcf.js and the worker, this should be fixed to account for other variants.
This should use the in-browser JSX transformer, so that we can cut browserify out of the iteration loop and use the React dev tools.
Note that we may want to consider how we handle multiple samples in a VCF.
The REF/ALT column sometimes contains large INDELs which come and go based on the filters present and cause the table columns to move horizontally. Let's fix column size and just display the full variant on rollover.
Should also properly parse string False in settings file.
Move to "omnibar" a la IGV, typing in things like "chr20:1,234,567-2,345,678" or just selecting the chromosome you want to move to by clicking it in the karyogram. Remove ability to select absolute ranges/base pairs.
For all of true positives, false positives, false negatives it would be nice to see some summary statistics or histgrams on the variants called, some ideas:
On the runs page, when you click the "Examine" button, the row for that run visibly expands before the browser heads to the Examine page.
Probably need to add a event.preventDefault()
or event.stopPropagation()
somewhere.
Currently, errors just result in debug HTML with a stacktrace.
If you run gulp
to get live reloading but forget a comma in a list, say, then it crashes:
throw er; // Unhandled 'error' event
^
ReactifyError: /Users/danvk/github/cycledash/cycledash/static/js/examine/BioDalliance.js: Parse Error: Line 15: Unexpected token } while parsing file: /Users/danvk/github/cycledash/cycledash/static/js/examine/BioDalliance.js
at throwError (/Users/danvk/github/cycledash/node_modules/reactify/node_modules/jstransform/node_modules/esprima-fb/esprima.js:2338:21)
at throwUnexpected (/Users/danvk/github/cycledash/node_modules/reactify/node_modules/jstransform/node_modules/esprima-fb/esprima.js:2400:9)
at parsePrimaryExpression (/Users/danvk/github/cycledash/node_modules/reactify/node_modules/jstransform/node_modules/esprima-fb/esprima.js:2909:9)
at parseLeftHandSideExpressionAllowCall (/Users/danvk/github/cycledash/node_modules/reactify/node_modules/jstransform/node_modules/esprima-fb/esprima.js:2990:61)
at parsePostfixExpression (/Users/danvk/github/cycledash/node_modules/reactify/node_modules/jstransform/node_modules/esprima-fb/esprima.js:3030:20)
at parseUnaryExpression (/Users/danvk/github/cycledash/node_modules/reactify/node_modules/jstransform/node_modules/esprima-fb/esprima.js:3097:16)
at parseBinaryExpression (/Users/danvk/github/cycledash/node_modules/reactify/node_modules/jstransform/node_modules/esprima-fb/esprima.js:3187:16)
at parseConditionalExpression (/Users/danvk/github/cycledash/node_modules/reactify/node_modules/jstransform/node_modules/esprima-fb/esprima.js:3247:16)
at parseAssignmentExpression (/Users/danvk/github/cycledash/node_modules/reactify/node_modules/jstransform/node_modules/esprima-fb/esprima.js:3457:16)
at parseObjectProperty (/Users/danvk/github/cycledash/node_modules/reactify/node_modules/jstransform/node_modules/esprima-fb/esprima.js:2690:80)
at parseObjectInitialiser (/Users/danvk/github/cycledash/node_modules/reactify/node_modules/jstransform/node_modules/esprima-fb/esprima.js
A better behavior would be to make an empty bundled.js
file and stay alive.
This will require updating some code as well, as those have newer, breaking changes in their respective APIs.
This information can be parsed out of the VCF header.
Improve formatting: "-" in lieu of "N/A", etc.
For example:
http 'http://hammerlab-dev3.hpc.mssm.edu:5000/runs/18' > /tmp/run18.json
http POST :5000/runs < /tmp/run18.json
{
"error": "Run validation",
"message": "expected unicode for dictionary value @ data[u'reference_path']"
}
The issue is that the "falsePositive" field is set to null
in the output, but this is not considered valid input.l Removing the four null
values gets you to another set of errors:
grep -v 'null' /tmp/run18.json > /tmp/run18.fixed.json
{
"error": "Run validation",
"message": "extra keys not allowed @ data[u'false_positive']"
}
You have to remove almost all the fields before you can POST. Here's the JSON that worked for me:
{
"dataset": "Synth1",
"truthVcfPath": "/datasets/dream/data/synthetic-challenge-1/synthetic.challenge.set1.tumor.all.truth.vcf",
"variantCallerName": "virmid",
"vcfPath": "/user/mondes02/kresults/virmid-Synth1-default/snv.vcf"
}
POSTing this results in a stack trace, but the run does show up on the /runs page.
Should be an env variable loaded in config.py.
This is currently hard-coded in BioDalliance.js, but it should either be set from the environment or proxied through the cycledash server.
Instead of posting precision, recall and f1, submit a VCF and truth dataset and have the evaluation run.
@ryan-williams and @arahuja both have ad-hoc systems for recording their notes on individual called variants. There should be a way to do this from the examine page.
A simple markdown text box would work well, perhaps with edit history as a bonus.
To ensure that entries that are made on the same dataset match in dataset name can we have this autocomplete or as a dropdown as already submitted options + new?
Hopefully be indvidual columns on the examine page and on individual caller pages
Use the new idiogrammatik version and:
Right now, I think a crash occurs. It should not.
There's no visual indication of whether a particular attribute is being charted after you click on the column header. Setting a different background color would make it clear which headers are being charted and which ones aren't.
A lot can be done to reduce redundancy in the code, and improve the speed in other ways. This PR has to do with /ihodes/vcf.js has well.
Arrow functions are really nice syntactic sugar for React components.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.