Giter Club home page Giter Club logo

tr13's Introduction

Purpose

Tr13 is a library for constructing and using read-only compact (memory-efficient) in-memory trie data structures, using raw (byte-sequences, byte[]) keys and values. Raw values can be automatically converted to/from basic JDK types such as java.lang.String. Resulting tries are "raw", in that they are stored as raw byte sequences (ByteBuffer, off-heap or in-heap, or byte[]), meaning that garbage-collection overhead should be minimal as the main data area is either off-heap (for native or mmap'ed ByteBuffers), or a single byte[].

Main benefits for using such tries is both relatively low memory usage (typically 20 - 35% less than raw input size) and low GC impact.

Development

Mailing list: http://groups.google.com/group/ning-tr13-users

Usage

Tries need to be built in lexicographic order, so pre-sorting may be needed. For that, you may want to check out Java Merge-sort project. Key and value types

Building is done in two steps:

  1. Construct raw trie structure in streaming fashion, either into a memory buffer (like ByteArrayOutputStream), or a File.
  2. Construct actual trie from serialized raw sequence (File, byte[]).

Two-phase processing is needed since the actual result size is not known in advance.

Sample

This tr13 from LevelDB gist shows simple usage.

Documentation

Check out [https://github.com/ning/tr13/wiki]

tr13's People

Contributors

cowtowncoder avatar

Stargazers

Ilcyb avatar Kishore Gopalakrishna avatar Knut Wannheden avatar  avatar Paul Findlay avatar Angus H. avatar MURAOKA Taro avatar Jesse Kuhnert avatar Johannes Wachter avatar Thomas Matthijs avatar tom.wen avatar Adriano Machado avatar Jonathan Simms avatar Cagatay Kavukcuoglu avatar wrmsr avatar John Claus avatar Emir Muñoz avatar Wolfgang Hoschek avatar Adam Zell avatar Yingpeng Xu avatar winglechen avatar Ashwin Jayaprakash avatar Chad Retz avatar Jacques Nadeau avatar Sunny Gleason avatar Alex Feinberg avatar Jeanfrancois Arcand avatar Tim Watson avatar Antoine Toulme avatar

Watchers

Phillip Pearson avatar Pierre-Alexandre Meyer avatar  avatar Jonathan Aquino avatar  avatar Tim Williamson avatar Tim Galyean avatar  avatar Mita Mahadevan avatar  avatar James Cloos avatar Kevin Postlewaite avatar Emir Muñoz avatar James Chang avatar Nitin Kalra avatar  avatar

tr13's Issues

Add Iterator to allow iterating over all entries of a trie

It would be useful to allow iteration over entries of a trie -- TrieDumper already implements this for simple output, but this would have other uses as well, esp. converting to other data structs.

Given that internal storage is nothing like that for Maps, the primary interface should NOT be Map's iterator (that would give wrong ideas of optimal usage), but for compatibility it is possible to also provide access as Map.Entry instances (for JDK Maps this is optimal interface because entries returned are part of Map; for trie there is no such internal structure to use).

Allow limited mutability (changing value)

Although current tr13 structure is fundamentally immutable in some respects -- can not add entries for example -- it would be possible to add some forms of mutability.

And specifically, it should be possible to change value of an existing entry, iff length of value is not changed.

It might even be possible to allow deletion by some use of tombstones; although this is something application might be able to fake by regular changing of value, using some bit marker within value to indicate "has been deleted" condition.

Add option to reorder entries during building, for optimization

Assuming uniform access, but non-uniform distribution of keys, it should be possible to order child nodes in decreasing order of size (ideally number of leaves, but serialized size should be close enough approximation), to reduce average lookup time.

Strange exception coming out of bytesToUnsigned

I'm loading up a tr13 into a byte buffer then attempting to use it from a memory mapped byte buffer and getting errors attempting to lookup values I put in the tr13. It isn't even very large yet. It appears to be attempting to pull up a value which is much larger than the generated file itself.

Here is the exception:
Exception in thread "main" java.lang.IndexOutOfBoundsException
at java.nio.Buffer.checkIndex(Buffer.java:532)
at java.nio.DirectByteBuffer.get(DirectByteBuffer.java:253)
at com.ning.tr13.util.VInt.bytesToUnsigned(VInt.java:165)
at com.ning.tr13.impl.bytes.ByteBufferBytesTrieLookup._findValue(ByteBufferBytesTrieLookup.java:81)
at com.ning.tr13.impl.bytes.ByteBufferBytesTrieLookup.findValue(ByteBufferBytesTrieLookup.java:36)
at MemoryMappedLookup.testString(MemoryMappedLookup.java:174)

Any thoughts it I'm somehow causing this? There is quite a lot of code to try and post a small sample. I will try and cook up something if this doesn't ring any bells with anybody.

Add convenience method(s) for build + create, using `ByteAggregator`

It would be nice to be able to combine build-in-memory, and creator of trie into one step, to be used for cases where disk-backed version is not necessary.

For this I can use ByteAggregator from StoreMate (or other projects I use it on).
Its benefit over ByteArrayOutputStream is that there's no need for constant reallocation, copying, as it is segment-based. This helps with GC. Plus it could even use off-heap buffer segments if that seems necessary.

Implement functionality to merge multiple tries

Given iterator functionality (see http://github.com/ning/tr13/issues#issue/2), it should be easy to implement merge functionality, where 2 (or perhaps more) input tries are merged into a result trie.
This assumes that tries are ordered the same way (meaning usually they are not reordered).

Functionality would be most useful for incremental building of tries, for example in (near) realtime cases where mutable HashMap (or such) is used for gathering new entries, which are regularly merged with immutable tries instance, to produce new immutable version.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.