ranweiler / dent Goto Github PK
View Code? Open in Web Editor NEWLibrary and tool for summarizing, comparing small data sets
License: ISC License
Library and tool for summarizing, comparing small data sets
License: ISC License
Add a flag to the exe to support emitting TSV output with full floating point precision, with or without headers By default, this should just emit TSV for each sample summary. If a flag is set to perform significance tests, then emit only those.
Show more precision by default, maybe make it configurable, and in any case, ensure column headers are aligned with data.
Right now, it is an error to invoke dent sample.txt
. Instead, it should act like dent -s < sample.txt
.
Right now, if any line in a file or steam of sample data cannot be parsed as a float, dent
panics. We can improve on this in a few ways:
--lax
and the short name were -l
, -l
would skip blank lines, but -ll
would skip any line that can't be parsed as an f64
.stderr
, such as a fragment of the bad input. For files, this can include line number.Right now, especially when sample data is very far from normally distributed, we fail in weird ways, including some which panic (due to our liberal first-draft usage of unwrap()
). Remove the panics and make plot rendering fail nicely.
The box plots visually indicate the lower and upper quartiles of each sample, but we doesn't print their values. We should also indicate the diagram scale (e.g. the add labels for the overall min and max across samples).
Make this interact nicely with plotting, too.
Right now, we require the user to enter non-default significance levels in a very rigid format, and they must choose from a small set of such values. At a minimum, parse these levels more forgivingly, and explore moving from a table of critical values to runtime computation. However flexible we are or are not, indicate the situation in the binary tool's help text.
Right now, we always perform a t-test exactly when we have two input samples. We should add flags to disable/enable this, and perhaps perform tests on all input files when there are more than two. For example, given input files a
, b
, and c
, we could either run and display t-tests for {a, b}
and {b, c}
(compare the sequential pairs), or compare every two-element subset of {a, b, c}
. The former is probably more useful in practice.
If we plot > 1 samples, we should plot them on the same scale by default, so they are visually comparable. We currently do this with exactly 2 samples, when performing t-tests, but in an ad hoc way. This is probably best done when improving our corner case plot handling.
Support both single data set plots when summarizing and scaled, stacked plots when comparing file data. Include a flag in the binary, and expose in the library.
Especially when it comes to TTY plotting, a few extreme outliers can instantly blow our ratio budgets. For example, if the ratio (max - min) / (q3 - q1)
is too large, we won't have the resolution to display a boxplot, because we can't make the box small enough relative to the overall size of the figure.
We can address this by providing options to do any of the following:
Q ± 1.5 IQR
, where Q
is the 1st or 3rd quartile.If two samples A
and B
have significantly different means, compute and display the difference in those means, along with a confidence interval.
Right now, the Summary
struct has a data
field, and provides methods to recompute statistics on that sample data, on demand. Instead, make Summary
just a read-only struct of the summary statistics without the data (which isn't editable or even readable right now), computed on construction. For now, continue to make a temporary sorted copy of the entire data set, but drop it when we return from the ctor.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.