Giter Club home page Giter Club logo

runs's Introduction

Build Status DOI

Weighted-runs (SQUARES) statistic

This project contains a mathematica and a C++ implementation of the cumulative distribution function of the weighted-runs statistic originally defined in

Frederik Beaujean and Allen Caldwell. A Test Statistic for Weighted Runs. Journal of Statistical Planning and Inference 141, no. 11 (November 2011): 3437โ€“46. doi:10.1016/j.jspi.2011.04.022 arXiv:1005.3233

We derived an approximation to be able to compute the cumulative also for large number of observations in

Frederik Beaujean and Allen Caldwell. Is the bump significant? An axion-search example arXiv:1710.06642

where we renamed the weighted-runs statistic to the SQUARES statistic.

mathematica

The reference implementation is in the mathematica package RunsWeightedPackage. To use it from a mathematica notebook, download the package to a directory /package/dir on your computer and do

SetDirectory["/package/dir"]
Needs["RunsWeightedPackage`"]

All exported commands in the package are accessible via

?"RunsWeightedPackage`*"

The most important commands to compute the p value for the runs statistic are

(* compute the p value using the exact expression that involves summing over integer partitions *)
n = 10
pValueRuns[3.3, n]

(* use Monte Carlo experiments to approximate the p value. Much faster for n > 80 than the exact expression *)
n = 500
pValueRunsMC[35.3, n]

(* calculate the runs statistic from a list of values interpreted as independent samples from a standard normal distribution *)
runsSuccess
runsSuccess[{-1,1,3,-2}]==10

c++

partitions

The code features a standalone implementation of generating all integer partitions of n in the multiplicity representation based on the pseudocode of Algorithm Z in

A. Zoghbi: Algorithms for generating integer partitions, Ottawa (1993), http://www.ruor.uottawa.ca/handle/10393/6506.

I made the necessary modifications to partition n into exactly k parts as well. Some simple examples

#include <partitions.h>
#include <iostream>

using namespace partitions;

// print all partitions of 6 into any number of parts
for (PartitionGenerator gen(6); gen; ++gen)
    std::cout << *gen << std::endl;

// print all partitions of 6 into 3 parts
for (KPartitionGenerator gen(6, 3); gen; ++gen)
    std::cout << *gen << std::endl;

gen is an iterator to a Partition and permits visiting all partitions while keeping memory allocations to a minimum. A Partition is essentially a triple c, y, h (in Zoghbi's notation) where

n = \sum_{i=1}^h c_i * y_i
  • c = Partition::mult() refers to a vector with the multiplicities,
  • y = Partition::parts() refers to a vector with distinct parts,
  • h = Partition::distinct_parts() refers to the number of distinct parts.

To reduce the memory allocations, the generator updates the partition in place and the vectors c,y are long enough to hold the partition with the maximum number of parts and a buffer element. For example, the first partition of 6 into 3 parts is

6 = 4+1+1
  = 2*1 + 1*4

In the code, this becomes

h == 2

c[1] == 2
y[1] == 1

c[2] == 1
y[2] == 4

Ignore the first element c[0], y[0] and do not read beyond c[h],y[h]!

SQUARES statistic

Tobs denotes the value of the SQUARES test statistic; i.e., the largest \chi^2 of any run of consecutive successes (above expectation) in a sequence of N independent trials with Gaussian uncertainty. Then the cumulative distribution P(T < Tobs | N) and the p value P(T >= Tobs| N) are available as

include "squares.h"

squares::cumulative(Tobs, N);
squares::pvalue(Tobs, N);

openMP helps as the speed-up of evaluating squares::cumulative for large N>50 scales linearly with the number of physical cores and even benefits from hyperthreading.

split runs

For large N, the number of terms in the exact expressions scales like exp(N^1/2)/N and quickly grows too large. We implement an approximate formula for n*N, where for example N = 100 is computed exactly and n may be 1 or >> N and need not even be an integer.

#include "squares_approx.h"

squares::approx_cumulative(Tobs, N, n);
squares::approx_pvalue(Tobs, N, n);

The approximation involves a 1D numerical integration whose relative and absolute target precision can be set as additional arguments, for example

approx_cumulative(Tobs, N, n, epsrel, epsabs)

For the exact meaning of these parameters consult the GSL manual.

build instructions

cmake, c++11, and GSL are required. OpenMP is optional but recommended.

google test is needed for the tests and downloaded automatically when building the first time.

git clone https://github.com/fredRos/runs.git
cd runs
mkdir build
cd build
cmake -DCMAKE_BUILD_TYPE=Release ..
make
OMP_NUM_THREADS=4 ./runs_test

To specify where the library and headers should be installed, use

cmake -DCMAKE_INSTALL_PREFIX=/tmp/runs ..

To install

make install

To link to this in your own code, just link to the library and include the headers from whereever you chose to install them. For example if you installed to /tmp/runs/, compile, link, and run your code in runstest.cxx

#include "squares.h"
#include <iostream>

int main()
{
    double Tobs = 3.3;
    unsigned N = 10;
    std::cout << "F(Tobs = " << Tobs << " | N = " << N << ") = "
              << squares::cumulative(Tobs, N) << std::endl;

    return 0;
}

like this

export LD_LIBRARY_PATH=/tmp/runs/lib/

# with openmp: gcc >= 4.2, clang >= 3.8
g++ -fopenmp runstest.cxx -I/tmp/runs/include -L/tmp/runs/lib -lruns -lgsl -lblas && ./a.out

# without openmp: usually on a mac
g++ runstest.cxx -I/tmp/runs/include -L/tmp/runs/lib -lruns -lgsl -lblas && ./a.out

testing

In the build directory, call

./runs_test

To select individual tests

./runs_test --gtest_filter='splitruns.*'

Help on available options

build/runs_test -h

strong scaling

The C++ implementation of squares::cumulative and thus squares::pvalue benefits from hyperthreading. On an Intel Core i7-4770 with four cores and a maximum frequency of 3.4 GHz with gcc 5.4 and release mode, we observed the following run times for the unit test OMP_NUM_THREADS=n ./runs_test --gtest_filter=splitruns.paper_timing that computes squares::pvalue(T, N=96)

n time / ms speed up
1 9400 1
2 4750 2
4 2450 3.8
8 1750 5.4

This is a nice example where hyperthreading brings a noticeable improvement beyond the number of physical cores.

citing

If you use this code in an academic setting, please cite these references.

@article{beaujean2011test,
title={A test statistic for weighted runs},
author={Beaujean, Frederik and Caldwell, Allen},
journal={Journal of Statistical Planning and Inference},
volume={141},
number={11},
pages={3437--3446},
year={2011},
publisher={Elsevier}
}
@article{Beaujean:2017eyq,
  author         = "Beaujean, Frederik and Caldwell, Allen and Reimann, Olaf",
  title          = "{Is the bump significant? An axion-search example}",
  year           = "2017",
  eprint         = "1710.06642",
  archivePrefix  = "arXiv",
  primaryClass   = "hep-ex",
  SLACcitation   = "%%CITATION = ARXIV:1710.06642;%%"

}

license

The code is released under the MIT license, see the LICENSE file. It comes bundled with parts of the cubature package that is under the GPLv3.

runs's People

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.