Giter Club home page Giter Club logo

maybe's Introduction

Maybe.NET

Build status Coverage Status

Maybe.NET is a lightweight library of probabilistic data structures for .NET. The library currently features Bloom Filters, Counting Bloom Filters, and Skip Lists. And more data structures are coming soon! Stop scouring the Internet and re-writing the same classes over and over -- use Maybe.NET.

Installation

Installation is super simple with NuGet! Just use this command to install from the Visual Studio Package Manager Console:

Install-Package Maybe.NET

Usage

Maybe.NET has a clear, intuitive API that is easy to pick up. You can check out the Maybe.Tests project for examples of using each method. Here are some quick examples to get you started.

Bloom Filter Usage

The bloom filter is a handy collection to which you can add items and check if an item is contained in the collection. They are very fast and memory-efficient, but it comes at a small cost: the filter can definitely say if an item is NOT in the collection, but it can't say for sure that an item is in the collection, only that it MIGHT be. You can use the constructor to specify your targeted maximum rate of errors. (Lower error rates may use more memory)

var filter = new BloomFilter<int>(50, 0.02);
filter.Add(42);
filter.Add(27);
filter.Add(33);

filter.Contains(55); // returns false (the item is NOT in the collection)
filter.Contains(27); // returns true (the item MIGHT be in the collection)

Counting Bloom Filter Usage

Counting bloom filters extend regular bloom filter functionality by allowing items to be removed from the collection. This can be useful functionality, but it opens the possibility of false negatives.

var filter = new CountingBloomFilter<int>(50, 0.02);
filter.Add(42);
filter.Contains(42); // returns true
filter.Remove(42);
filter.Contains(42); // returns false

Scalable Bloom Filter Usage

Scalable bloom filters offer the same Add and Contains operations that normal BloomFilter offers. The difference is that ScalableBloomFilter only needs to know the max error rate. The capacity of the bloom filter will grow by adding additional bloom filters internally, which allows the developer to add more items to the bloom filter without worrying about incurring too high of a false positive rate.

var filter = new ScalableBloomFilter<int>(0.02); // specifying 2% max error rate, no capacity required
filter.Add(42);
filter.Add(27);
filter.Add(33); // add as many items as needed. The scalable bloom filter will create as many filters as needed to hold data and keep error rates within tolerance.

filter.Contains(55); // returns false (the item is NOT in the collection)
filter.Contains(27); // returns true (the item MIGHT be in the collection)

Skip List Usage

Skip lists are sort of like a singly linked list, but they have additional links to other nodes further out in the list. The structure of the links is similar to building Binary Search into the Skip List. However, the Skip List uses randomness to avoid expensive balancing operations when the list is being modified. This structure allows for searching in logarithmic time on average, but doesn't incur the cost of balancing a tree that is normally incurred for fast search. See the wikipedia article for detailed information.

var list = new SkipList<int> {42, 33};
list.Contains(42); // returns true
list.Contains(91); // returns false

Count Min Sketch Usage

Count min sketch is a data structure that allows you to track the frequency of an item occurring within a large set. The count min sketch will never undercount items, but it can overcount by a controllable confidence interval. This is great for counting in very large (think big data) datasets where you can't deterministically fit everything into memory.

var sketch = new CountMinSketch<string>(5d, 0.95d, 42);
sketch.Add("test");
sketch.Add("foobar");
var estimate = sketch.EstimateCount("test"); // returns 1

Count min sketch also supports merging for parallel work. You can divide your workload and process in multiple Count Min Sketches in parallel. Then, merge the sketches from each workload at the end to see the overall result. Just make sure to initialize the sketches with the same configuration.

Contributing

Contributions are always welcome! Please feel free to submit pull requests and to open issues. I prefer to have tests on all public methods if possible and where ever else makes sense.

License

Free to use under MIT license

maybe's People

Contributors

rmc00 avatar romankreisel avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

maybe's Issues

Cannot install package for net462 project on VS2017

Environment:

  • Windows 10
  • Visual Studio 2017
  • C# project with framework 4.6.2
  • Maybe.Net v1.0.77

Error Message: (Sorry for Chinese locale, the main meaning is the package not compatible with net462)

正在还原 D:\GitProjects\BigsetExisting\BigsetExisting\BigsetExisting.csproj 的包...
包 Maybe.NET 1.0.77 与 net462 (.NETFramework,Version=v4.6.2) 不兼容。 包 Maybe.NET 1.0.77 支持: release (Release,Version=v0.0)
包 Maybe.NET 1.0.77 与 net462 (.NETFramework,Version=v4.6.2) / win 不兼容。 包 Maybe.NET 1.0.77 支持: release (Release,Version=v0.0)
包 Maybe.NET 1.0.77 与 net462 (.NETFramework,Version=v4.6.2) / win-x64 不兼容。 包 Maybe.NET 1.0.77 支持: release (Release,Version=v0.0)
包 Maybe.NET 1.0.77 与 net462 (.NETFramework,Version=v4.6.2) / win-x86 不兼容。 包 Maybe.NET 1.0.77 支持: release (Release,Version=v0.0)
程序包还原失败。正在回滚“BigsetExisting”的程序包更改。
已用时间: 00:00:00.2589401

Reproduce steps:

  1. New Console project with net framework 4.6.2
  2. Manage NuGet package
  3. Find & Add "Maybe.Net 1.0.77"

Performance issues?

Hi - thanks for creating a library for Bloom filters (with a great API!)

I have run some experiments using your scalable Bloom filters, but they do not seem to perform very well :(

I created a fork containing code for benchmarking the use cases I want to use your library for, as well as experiments with optimizing some of your code.

One performance bottleneck can be found in your choice of method for converting objects to bytes, in order to hash the bytes. I have created a project for benchmarking various methods for the conversion, showing that the BinaryFormatter used in your library performs horribly compared to the alternatives.

The benchmarks in my fork shows the performance improvements achievable by replacing BinaryFormatter with alternatives (I have included the benchmarking results at the bottom of this issue - notice the memory usage column on the far right)

Despite the new optimizations, the memory usage of your Bloom filters (esspecially the scalable version, which I really want to use) is very high compared to an alternative like a HashSet.

Is this just the nature of the implementation, or can it be improved?

(The table below is a part of the output of the benchmarking code. In the tabl, your original implementation of a scalable Bloom filter is called ScalableBloomFilter.)

BenchmarkDotNet=v0.11.5, OS=macOS Mojave 10.14.5 (18F203) [Darwin 18.6.0]
Intel Core i7-8850H CPU 2.60GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=2.2.300
  [Host] : .NET Core 2.2.5 (CoreCLR 4.6.27617.05, CoreFX 4.6.27618.01), 64bit RyuJIT
  Core   : .NET Core 2.2.5 (CoreCLR 4.6.27617.05, CoreFX 4.6.27618.01), 64bit RyuJIT

Job=Core  Runtime=Core  
Method ItemsToInsert MaximumErrorRate Mean Error StdDev Ratio RatioSD Rank Gen 0 Gen 1 Gen 2 Allocated
HashSet 1000 0.02 146.3 μs 0.1939 μs 0.1620 μs 1.00 0.00 1 15.1367 - - 70.31 KB
StringOptimizedScalableBloomFilter 1000 0.02 909.7 μs 5.6858 μs 5.0404 μs 6.21 0.03 2 234.3750 - - 1083.49 KB
GenericOptimizedScalableBloomFilter 1000 0.02 5,701.6 μs 42.8845 μs 40.1142 μs 39.00 0.27 3 2414.0625 - - 11144.79 KB
ScalableBloomFilter 1000 0.02 6,271.6 μs 56.0022 μs 52.3845 μs 42.89 0.41 4 2390.6250 - - 11029.23 KB
HashSet 10000 0.02 1,492.3 μs 3.4257 μs 3.0368 μs 1.00 0.00 1 152.3438 - - 703.13 KB
StringOptimizedScalableBloomFilter 10000 0.02 17,605.3 μs 45.3247 μs 37.8482 μs 11.80 0.04 2 4062.5000 - - 18814.31 KB
GenericOptimizedScalableBloomFilter 10000 0.02 106,426.8 μs 1,297.8431 μs 1,214.0032 μs 71.35 0.88 3 43600.0000 - - 201469.91 KB
ScalableBloomFilter 10000 0.02 114,675.1 μs 952.2085 μs 844.1081 μs 76.84 0.62 4 43200.0000 - - 199364.2 KB
HashSet 50000 0.02 8,018.3 μs 23.7326 μs 21.0383 μs 1.00 0.00 1 828.1250 - - 3828.13 KB
StringOptimizedScalableBloomFilter 50000 0.02 103,353.9 μs 1,100.2632 μs 975.3547 μs 12.89 0.11 2 26400.0000 - - 122239.81 KB
GenericOptimizedScalableBloomFilter 50000 0.02 693,657.8 μs 2,892.3871 μs 2,564.0259 μs 86.51 0.44 3 285000.0000 - - 1316991.46 KB
ScalableBloomFilter 50000 0.02 734,364.0 μs 9,817.7735 μs 9,183.5514 μs 91.65 1.16 4 282000.0000 - - 1303198.18 KB

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.