Giter Club home page Giter Club logo

internment's Introduction

Internment โ€ƒ Latest version Documentation Build Status Windows Build status

Coverage status

Changelog

A very easy to use library for interning strings or other data in rust. Interned data is very efficient to either hash or compare for equality (just a pointer comparison). Data is also automatically de-duplicated.

You have three options with the internment crate:

  1. Intern, which will never free your data. This means that an Intern is Copy, so you can make as many copies of the pointer as you may care to at no cost. Its implementation also uses no unsafe code.

  2. ArcIntern, which reference-counts your data and frees it when there are no more references. ArcIntern will keep memory use down, but uses an atomic increment/decrement whenever a clone of your pointer is made, or a pointer is dropped. Requires feature arc.

  3. ArenaIntern, which stores its data in an Arena, with the data being freed when the arena itself is freed. Requires feature arena.

In each case, accessing your data is a single pointer dereference, and the size of any internment data structure (Intern or ArcIntern or ArenaIntern) is a single pointer. In each case, you have a guarantee that a single data value (as defined by Eq and Hash) will correspond to a single pointer value. This means that we can use pointer comparison (and a pointer hash) in place of value comparisons, which is very fast.

Also note that if you do not use the Intern type, you may wish to compile with cargo build --no-default-features --features arc (or arena), which will slightly speed up your build and trim down your executable size.

Example

use internment::Intern;
let x = Intern::new("hello");
let y = Intern::new("world");
assert_ne!(x, y);
println!("The conventional greeting is '{} {}'", x, y);

Arena example

use internment::Arena;
let arena: Arena<&'static str> = Arena::new();
let x = arena.intern("hello");
let y = arena.intern("world");
assert_ne!(x, y);
println!("The conventional greeting is '{} {}'", x, y);

Comparison with other interning crates

There are already several interning crates available on crates.io. What makes internment different? Many of the interning crates are specific to strings. The general purpose interning crates are:

  1. symtern

  2. shawshank

  3. intern which is a work-in-progress.

Each of these crates implement arena allocation, with tokens of various sizes to reference an interned object. This approach makes them far more challenging to use than internment. Their approach also enables freeing of all interned objects at once when they go out of scope (which is an advantage).

The primary disadvantages of arena allocation relative to internment's approach are:

  1. Lookup of a token could fail, either because an invalid token could be generated by hand, or a token from one pool could be used by another. This adds an element of unsafety to code that uses interned objects: either they assume that they are bug-free and panic on errors, or they have error handling any place that uses tokens.

  2. Lookup of a token could give the wrong object, if multiple pools are used. This is easy to avoid if you avoid ever using more than one pool, but then you may gain little benefit from the arena allocation.

  3. Lookup of a token is slow. They all advertise being fast, but any lookup is going to be slower than pointer dereferencing. To be fair, increased memory locality could in principle make token lookup faster for some access patterns, but I doubt it.

To balance this, because internment has tokens that are globally valid, it uses a Mutex to protect its internal data, which is taken on the interning of new data, which is probably slower than the other interning crates (unless you want to use their tokens across threads, in which case you'd have to put the pool in a Mutex and pay the same penalty).

Another interning crate which is very similar to internment is:

  1. hashconsing

The hashconsing crate is considerably more complicated in its API than internment, but generates global pointers in a similar way. The HConsed<T> data type is always referenced counted with Arc, which makes it similar to ArcIntern, which is less efficient than Intern, but does not eternally leak memory.

internment's People

Contributors

ahornby avatar droundy avatar dtolnay avatar felixtyson avatar jcapucho avatar michaelrawson avatar spl avatar stepancheg avatar thepuzzlemaker avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.