Giter Club home page Giter Club logo

hyperloglog's Introduction

hll::hyperloglog: HyperLogLog++ with C++14 Actions Status

HyperLogLog is a probabilistic data structure that can help you estimate cardinality of very large multisets with a pre-determined accuracy using constant space.

Typical cardinality calculation methods, say using Pythons collections.Counter or using C++'s std::unordered_map, need O(n) space to count n unique elements. HyperLogLog on the other hand requires a constant O(1) amount of space for estimating cardinality of multisets with billions of items within a pre-determined range of accuracy. For example, you can use HyperLogLog to estimate the number of unique IP addresses that connect to your web server or the number of unique words in a book to within a percent of the actual value all with a few kilobytes of memory.

The tradeoff here is between the amount of constant space you allocate to the HyperLogLog data structure and the accuracy as determined by the relative error. The more registers you use, the more accurate your estimations are going to be. Each register is represented here by a uint8_t, although a maximum of 6 bits of each register is ever used. A HyperLogLog data structure with 2m registers has a relative error or 1.04/โˆšm. This means that a HyperLogLog data structure with m=12 uses 212 registers (~4kB) and has a relative error of 1.6% for large multisets.

Installation

This package relies on cmake. Make sure a moderatly recent version (3.14 or newer) is already installed. You should also have a C++ compiler with C++14 support.

Here is how to build and run the tests:

$ git clone https://github.com/arashbm/hyperloglog.git
$ cd hyperloglog
$ mkdir build  # make directory to build in
$ cd build
$ cmake ..
$ cmake --build . --target hyperloglog_tests  # build the tests
$ ./hyperloglog_tests

At this point you should see "All tests passed" if all the steps are successful. You can continue by install the library on your system:

cmake --build . --target install

This final step might require admin previlages.

Example

// in "example.cpp"

#include <iostream>
#include <hll/hyperloglog.hpp>

int main() {
  hll::hyperloglog<std::size_t, 18, 25> h;

  for (std::size_t i = 1; i <= 10'000'000ul; i++)
    h.insert(i);

  std::cout << "Estimate is " << h.estimate() << std::endl;
  return 0;
}

Assuming you have installed the library as instructed above, you can compile and run example.cpp with:

$ g++ -std=c++14 -lhyperloglog -o example example.cpp
$ time ./example
Estimate is 1.00296e+07

real    0m0.669s
user    0m0.661s
sys     0m0.000s

On a relativly recent comodity CPU hll::hyperloglog enables you to insert and calculate cardinality of ten million items in less than a second.

See more examples of hll::hyperloglog in the tests located at src/tests.cpp.

hyperloglog's People

Contributors

arashbm avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.