statrs-dev / statrs Goto Github PK
View Code? Open in Web Editor NEWStatistical computation library for Rust
Home Page: https://docs.rs/statrs/latest/statrs/
License: MIT License
Statistical computation library for Rust
Home Page: https://docs.rs/statrs/latest/statrs/
License: MIT License
Refactor traits to be more specific e.g. (trait Mode and trait Median). Should probably done in the same step as this issue
Need code coverage on case where data length > 10 for rank function in order to cover the quick_sort impl. As a side note, the quick sort impl is borrowed whole sale from Math.NET and there is probably a more idiomatic Rust implementation that could be used.
Need resources on good implementation of inverse digamma (psi) function. The Math.NET code seems to be incorrect and doesn't even pass their own unit tests
Need to improve README documentation
Currently the responsibility for guarding against exceptional cases (e.g. input not in valid domain, mathematically invalid operations etc) is passed to the user. We panic
when an operation does not make mathematical sense (e.g. calculating the cumulative distribution function for discrete distributions at a negative input) which forces users to double check to make sure their inputs are valid. While this results in technically correct and predictable behavior from the API, I'm not sure if it's ergonomic or idiomatic and have been mulling over possibly introducing a Result
based API either replacing or in addition to the stricter panic
based API. This however warrants some discussion and I would love feedback from the community
I'm not personally familiar enough with the gamma distribution to say what the desired behavior for pmf
, ln_pmf
, and cdf
should be if shape
or rate
are f64::INFINITY
I just saw this in the code:
#[ignore]
#[test]
fn test_mean_variance_stability() {
// TODO: Implement tests. Depends on Mersenne Twister RNG implementation.
// Currently hesistant to bring extra dependency just for test
}
You can add dependencies to the Cargo.toml that are only used when running the tests, but not when using the library as a dependency: http://doc.crates.io/specifying-dependencies.html#development-dependencies
Currently the iterator statistics trait is treated as a special case since to act over the iterator the methods need to take a mutable reference, so all the traits from statrs::statistics
are (going to be) combined in the IterStatistics
trait that is implemented for all Iterators. I haven't come up with a better solution but for some reason this implementation doesn't sit too well with me and I'd love to have someone review it and provide feedback.
Port over streaming statistics. Implement statistics trait for Vector (should be simple wrapper of slice)
Distributions are currently Univariate<i64, f64>
but I'm pretty sure they can all be changed to Univariate<u64, f64>
Go through code and remove matches on floats, replace them with if/else. See link
Port over Numerics/Statistics extensions for f64 slice and iterable
For certain distributions, (Bernoulli
comes to mind), the second generic parameter N
is hidden since Bernoulli
depends on Binomial
which has two parameters P
and N
but the N
parameter is always 1
in the Bernoulli
distribution. This effectively prevents users from defining the numeric types for such a distribution with the constructor and requires them to explicitly define the type on the variable, leading to verbose declarations such as let n: Result<Bernoulli<f64, u64>> = Bernoulli::new(0.5)
. I'd like to find a way around this issue after 0.4.0
is released. (Or maybe before if a solution is found quick enough)
Currently statrs only supports f64 but I'd like to at the minimum extend that to f32 and possibly other numeric types as well (especially for things in the statistics
module). The num crate might be worth looking into but I'm hesistant about introducing the dependency when it might make it's way in to the standard library at some point.
Next time the crate is published, use nightly cargo in order to add ourselves to the science
category on crates.io
Experiement with sampling based on normal distribution in rand crate (or better yet implement ziggurat algo for normal sampling)
References:
https://github.com/rust-lang-nursery/rand/blob/master/src/distributions/mod.rs#L224
https://github.com/rust-lang-nursery/rand/blob/master/src/distributions/ziggurat_tables.rs
Special functions sorely in need of unit testing
Some distributions panic:
Some distributions return 0 or 1:
Some distributions return NaN:
No error handling defined:
My gut reaction is to panic
on all invalid input domains for cdf
and possibly special functions as well. The other options are to move towards propagating NaN or returning Result<T, StatsError>
for functions like cdf
, pdf
, pmf
, ln_pdf
, and ln_pmf
(possibly including special functions).
Implement first multinomial distribution (Dirichlet)
Modify all panicking special functions to return Result to be more in accordance with RFC 236
The documentation for special functions (e.g. gamma, erf, etc) leave something to be desired
Remove unchecked versions of distributions in favor of checked versions returning Result in order to be more in accordance with RFC 236.
Implement macro to short hand prec::almost_eq
Expand sequence generators to allow for infinite iteration
Reference:
https://github.com/mathnet/mathnet-numerics/blob/master/src/Numerics/Generate.cs
https://github.com/mathnet/mathnet-numerics/blob/master/src/UnitTests/GenerateTests.cs
See rust-lang/rust#43680 for more details
Modify the Distribution trait to implement Sample and IndependentSample traits from the rand package instead of defining its own sample function
Traits like mean, mode, median, entropy, etc. don't need to be in the Distribution module or dependent on the Distribution trait
Port over additional functionality from special functions package
Need a rustfmt.toml to enforce consistent style guidelines
See #60
Math.NET has an implementation but no unit tests, and I couldn't find a source for the calculation from a cursory search. If anyone can provide a source or derivation for either a closed form or numerical solution (or prove the calculation for https://github.com/mathnet/mathnet-numerics/blob/master/src/Numerics/Distributions/Categorical.cs#L274) then we can move forward with implementation
Discrete distributions currently implement Distribution<f64>
but this should be changed to Distribution<i64>
to be more accurate
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.