Giter Club home page Giter Club logo

encprim's Introduction

encprim

is a serializer for primitive python objects, similar to cPickle, json or msgpack but written in pure python. Find the reasons in the corresponding section below.

Overview

  1. initially designed as extension for the python module struct
  2. later inspired by msgpack and pickle
  3. encodes None, bool, int/long (arbitrary size), float, complex, str, unicode, slice and bitarray
  4. supports nesting in tuple, list, set and dict
  5. detects and exploits type repetitions

Output Syntax

object: [count]type[data]
tuple / list / set: sequence of objects enclosed by () / [] / <>
dict: sequence of objects, keys first then values, enclosed by {}

Examples with data replaced by "."
(2TF2N) == (True, True, False, None, None)
{(2i..)()d.4s.} == {(3, 7): 1.2, (): 'text'}

Getting Started

Copy the encprim directory somewhere python will find it

import encprim
x = 1
a = encprim.encode( x )
print repr(a)
b = encprim.decodes( a )
assert b == x

import bitarray
print repr(encprim.encode( bitarray.bitarray('11011') ))

encprim.enableTypes([tuple, list, set, dict])
print repr(encprim.encode( {(): [set([])]} ))

encode returns None when the object contains non encodable types or there are recursions. By default all container types (tuple, list, set, dict) are disabled due to performance reasons (see below). Use cPickle for these types, or enable them with enableTypes.

You can start the test suite by executing the _init_.py file directly. _init_.out contains an example output.
Add the argument "-i" to get into interactive mode, where you can type in python structs for which the encoded value is printed, alongside with its size in bytes and the size ratio compared to pickle (lower is better).

Reasons

  • why not use pickle

pickle is great, especially its power to serialize really everything there is, including functions/classes defined at _main_ level thanks to Oren Tirosh's monkey patch*. But if you find yourself in the situation where you have to serialize a large number of small objects, then every byte may count.

  • why not use existing serializers like json, msgpack

Besides the external dependency they exist for a number of other reasons and are therefore not 100% compliant with python types. For example** json's dictionary keys need to be strings and msgpack can't distinct between tuple and list.

  • why not

I found that pickle is quite efficient but for one exception: complex numbers. The output shouldn't be much larger than two doubles, but it somehow is. Also pickle seems to store a lot of unnecessary information when it is fed with a bitarray. This type is not built in, but I love it and use it extensively.

So I decided to put some effort into this module. I hope you like it or find your usecase.

Performance

The test suite shows that encprim produces outputs which are, on average, 40% smaller compared to pickle. Depending on the data type the rate could rise above 90% (bitarrays with len < 16) or drop below significance (big integers). By design the best results are achieved with single values or a flat collection of same typed values.

Regarding runtime, the size optimizations come at a prize. Compared to pickle, which is a valid baseline because it is pure python too, encprim is, on average, about twice as fast when encoding and only a few percent faster when decoding. However, en/decoding single values is about 5 times faster than pickle. cPickle is fastest in any case but encprim comes very close when en/decoding single values :)

* http://code.activestate.com/recipes/572213-pickle-the-interactive-interpreter-state/
** in my best knowledge

encprim's People

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.