Giter Club home page Giter Club logo

dent's People

Contributors

ranweiler avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

cemeyer

dent's Issues

Add flag for TSV output

Add a flag to the exe to support emitting TSV output with full floating point precision, with or without headers By default, this should just emit TSV for each sample summary. If a flag is set to perform significance tests, then emit only those.

Configurable strictness of sample data parsing

Right now, if any line in a file or steam of sample data cannot be parsed as a float, dent panics. We can improve on this in a few ways:

  • Add a permissive line reading option, which ignores blank or otherwise malformed lines. We could make this act like a verbosity flag. So, for example, if the long flag name were --lax and the short name were -l, -l would skip blank lines, but -ll would skip any line that can't be parsed as an f64.
  • If we do find an error, print useful info to stderr, such as a fragment of the bad input. For files, this can include line number.

Fail nicely when we lack TTY resolution

Right now, especially when sample data is very far from normally distributed, we fail in weird ways, including some which panic (due to our liberal first-draft usage of unwrap()). Remove the panics and make plot rendering fail nicely.

Print boxplot quartile values, plot scale

The box plots visually indicate the lower and upper quartiles of each sample, but we doesn't print their values. We should also indicate the diagram scale (e.g. the add labels for the overall min and max across samples).

Improve ergonomics of significance tests

Right now, we require the user to enter non-default significance levels in a very rigid format, and they must choose from a small set of such values. At a minimum, parse these levels more forgivingly, and explore moving from a table of critical values to runtime computation. However flexible we are or are not, indicate the situation in the binary tool's help text.

Significance test flags

Right now, we always perform a t-test exactly when we have two input samples. We should add flags to disable/enable this, and perhaps perform tests on all input files when there are more than two. For example, given input files a, b, and c, we could either run and display t-tests for {a, b} and {b, c} (compare the sequential pairs), or compare every two-element subset of {a, b, c}. The former is probably more useful in practice.

Display multiple plots on same scale

If we plot > 1 samples, we should plot them on the same scale by default, so they are visually comparable. We currently do this with exactly 2 samples, when performing t-tests, but in an ad hoc way. This is probably best done when improving our corner case plot handling.

Box and whisker plots

Support both single data set plots when summarizing and scaled, stacked plots when comparing file data. Include a flag in the binary, and expose in the library.

Add outlier-tolerant options

Especially when it comes to TTY plotting, a few extreme outliers can instantly blow our ratio budgets. For example, if the ratio (max - min) / (q3 - q1) is too large, we won't have the resolution to display a boxplot, because we can't make the box small enough relative to the overall size of the figure.

We can address this by providing options to do any of the following:

  • Filter outliers from the input data before any processing
  • Filter outliers from the input data before plotting only
  • Plot outliers, but determine the outer fences by a formula like Q ± 1.5 IQR, where Q is the 1st or 3rd quartile.
  • Allow different notions of "outlier"

Remove `data` field from `Summary`

Right now, the Summary struct has a data field, and provides methods to recompute statistics on that sample data, on demand. Instead, make Summary just a read-only struct of the summary statistics without the data (which isn't editable or even readable right now), computed on construction. For now, continue to make a temporary sorted copy of the entire data set, but drop it when we return from the ctor.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.