Giter Club home page Giter Club logo

xorfilter's Introduction

xorfilter: Go library implementing xor and binary fuse filters

GoDoc Build Status

Bloom filters are used to quickly check whether an element is part of a set. Xor and binary fuse filters are a faster and more concise alternative to Bloom filters. They are also smaller than cuckoo filters.

We are assuming that your set is made of 64-bit integers. If you have strings or other data structures, you need to hash them first to a 64-bit integer. It is not important to have a good hash function, but collision should be unlikely (~1/2^64).

The current implementation has a false positive rate of about 0.3% and a memory usage of less than 9 bits per entry for sizeable sets.

You construct the filter as follows starting from a slice of 64-bit integers:

filter,_ := xorfilter.PopulateBinaryFuse8(keys) // keys is of type []uint64

It returns an object of type BinaryFuse8. The 64-bit integers would typically be hash values of your objects.

You can then query it as follows:

filter.Contains(v) // v is of type uint64

It will always return true if v was part of the initial construction (Populate) and almost always return false otherwise.

An xor filter is immutable, it is concurrent. The expectation is that you build it once and use it many times.

Though the filter itself does not use much memory, the construction of the filter needs many bytes of memory per set entry.

For persistence, you only need to serialize the following data structure:

type BinaryFuse8 struct {
	Seed               uint64
	SegmentLength      uint32
	SegmentLengthMask  uint32
	SegmentCount       uint32
	SegmentCountLength uint32

	Fingerprints []uint8
}

Duplicate keys

When constructing the filter, you should ensure that there are not too many duplicate keys. If you are hashing objects with a good hash function, you should have no concern, because there should be very few collisions. However, you can construct cases where there are many duplicates. If you think that this might happen, then you should check the error condition.

filter,err := xorfilter.PopulateBinaryFuse8(keys) // keys is of type []uint64
if err != nil {
 // you have too many duplicate keys, de-duplicate them?
}

Effectively, an error is returned when the filter could not be build after MaxIterations iterations (default to 100).

Implementations of xor filters in other programming languages

xorfilter's People

Contributors

lemire avatar el10savio avatar ayazhafiz avatar mraerino avatar detailyang avatar canrau avatar veggiedefender avatar slimsag avatar github-actions[bot] avatar prataprc avatar funny-falcon avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.