Giter Club home page Giter Club logo

sophy's Introduction

sophy, fast Python bindings for Sophia embedded database, v2.2.

About sophy

  • Written in Cython for speed and low-overhead
  • Clean, memorable APIs
  • Extensive support for Sophia's features
  • Python 2 and Python 3 support
  • No 3rd-party dependencies besides Cython
  • Documentation on readthedocs

About Sophia

  • Ordered key/value store
  • Keys and values can be composed of multiple fieldsdata-types
  • ACID transactions
  • MVCC, optimistic, non-blocking concurrency with multiple readers and writers.
  • Multiple databases per environment
  • Multiple- and single-statement transactions across databases
  • Prefix searches
  • Automatic garbage collection and key expiration
  • Hot backup
  • Compression
  • Multi-threaded compaction
  • mmap support, direct I/O support
  • APIs for variety of statistics on storage engine internals
  • BSD licensed

Some ideas of where Sophia might be a good fit

  • Running on application servers, low-latency / high-throughput
  • Time-series
  • Analytics / Events / Logging
  • Full-text search
  • Secondary-index for external data-store

Limitations

  • Not tested on Windoze.

If you encounter any bugs in the library, please open an issue, including a description of the bug and any related traceback.

Installation

The sophia sources are bundled with the sophy source code, so the only thing you need to install is Cython. You can install from GitHub or from PyPI.

Pip instructions:

$ pip install Cython
$ pip install sophy

Or to install the latest code from master:

$ pip install -e git+https://github.com/coleifer/sophy#egg=sophy

Git instructions:

$ pip install Cython
$ git clone https://github.com/coleifer/sophy
$ cd sophy
$ python setup.py build
$ python setup.py install

To run the tests:

$ python tests.py


Overview

Sophy is very simple to use. It acts like a Python dict object, but in addition to normal dictionary operations, you can read slices of data that are returned efficiently using cursors. Similarly, bulk writes using update() use an efficient, atomic batch operation.

Despite the simple APIs, Sophia has quite a few advanced features. There is too much to cover everything in this document, so be sure to check out the official Sophia storage engine documentation.

The next section will show how to perform common actions with sophy.

Using Sophy

Let's begin by import sophy and creating an environment. The environment can host multiple databases, each of which may have a different schema. In this example our database will store arbitrary binary data as the key and value. Finally we'll open the environment so we can start storing and retrieving data.

from sophy import Sophia, Schema, StringIndex

# Instantiate our environment by passing a directory path which will store the
# various data and metadata for our databases.
env = Sophia('/path/to/store/data')

# We'll define a very simple schema consisting of a single utf-8 string for the
# key, and a single utf-8 string for the associated value.
schema = Schema(key_parts=[StringIndex('key')],
                value_parts=[StringIndex('value')])

# Create a key/value database using the schema above.
db = env.add_database('example_db', schema)

if not env.open():
    raise Exception('Unable to open Sophia environment.')

CRUD operations

Sophy databases use the familiar dict APIs for CRUD operations:

db['name'] = 'Huey'
db['animal_type'] = 'cat'
print db['name'], 'is a', db['animal_type']  # Huey is a cat

'name' in db  # True
'color' in db  # False

db['temp_val'] = 'foo'
del db['temp_val']
print db['temp_val']  # raises a KeyError.

Use update() for bulk-insert, and multi_get() for bulk-fetch. Unlike __getitem__(), calling multi_get() with a non-existant key will not raise an exception and return None instead.

db.update(k1='v1', k2='v2', k3='v3')

for value in db.multi_get('k1', 'k3', 'kx'):
    print value
# v1
# v3
# None

result_dict = db.multi_get_dict(['k1', 'k3', 'kx'])
# {'k1': 'v1', 'k3': 'v3'}

Other dictionary methods

Sophy databases also provides efficient implementations for keys(), values() and items(). Unlike dictionaries, however, iterating directly over a Sophy database will return the equivalent of the items() (as opposed to the just the keys):

db.update(k1='v1', k2='v2', k3='v3')

list(db)
# [('k1', 'v1'), ('k2', 'v2'), ('k3', 'v3')]


db.items()
# same as above.


db.keys()
# ['k1', 'k2', 'k3']


db.values()
# ['v1', 'v2', 'v3']

There are two ways to get the count of items in a database. You can use the len() function, which is not very efficient since it must allocate a cursor and iterate through the full database. An alternative is the index_count property, which may not be exact as it includes transactional duplicates and not-yet-merged duplicates.

print(len(db))
# 4

print(db.index_count)
# 4

Fetching ranges

Because Sophia is an ordered data-store, performing ordered range scans is efficient. To retrieve a range of key-value pairs with Sophy, use the ordinary dictionary lookup with a slice instead.

db.update(k1='v1', k2='v2', k3='v3', k4='v4')


# Slice key-ranges are inclusive:
db['k1':'k3']
# [('k1', 'v1'), ('k2', 'v2'), ('k3', 'v3')]


# Inexact matches are fine, too:
db['k1.1':'k3.1']
# [('k2', 'v2'), ('k3', 'v3')]


# Leave the start or end empty to retrieve from the first/to the last key:
db[:'k2']
# [('k1', 'v1'), ('k2', 'v2')]

db['k3':]
# [('k3', 'v3'), ('k4', 'v4')]


# To retrieve a range in reverse order, use the higher key first:
db['k3':'k1']
# [('k3', 'v3'), ('k2', 'v2'), ('k1', 'v1')]

To retrieve a range in reverse order where the start or end is unspecified, you can pass in True as the step value of the slice to also indicate reverse:

db[:'k2':True]
# [('k2', 'k1'), ('k1', 'v1')]

db['k3'::True]
# [('k4', 'v4'), ('k3', 'v3')]

db[::True]
# [('k4', 'v4'), ('k3', 'v3'), ('k2', 'v2'), ('k1', 'v1')]

Cursors

For finer-grained control over iteration, or to do prefix-matching, Sophy provides a cursor interface.

The cursor() method accepts 5 parameters:

  • order (default=>=) -- semantics for matching the start key and ordering results.
  • key -- the start key
  • prefix -- search for prefix matches
  • keys -- (default=True) -- return keys while iterating
  • values -- (default=True) -- return values while iterating

Suppose we were storing events in a database and were using an ISO-8601-formatted date-time as the key. Since ISO-8601 sorts lexicographically, we could retrieve events in correct order simply by iterating. To retrieve a particular slice of time, a prefix could be specified:

# Iterate over events for July, 2017:
for timestamp, event_data in db.cursor(key='2017-07-01T00:00:00',
                                       prefix='2017-07-'):
    do_something()

Transactions

Sophia supports ACID transactions. Even better, a single transaction can cover operations to multiple databases in a given environment.

Example usage:

account_balance = env.add_database('balance', ...)
transaction_log = env.add_database('transaction_log', ...)

# ...

def transfer_funds(from_acct, to_acct, amount):
    with env.transaction() as txn:
        # To write to a database within a transaction, obtain a reference to
        # a wrapper object for the db:
        txn_acct_bal = txn[account_balance]
        txn_log = txn[transaction_log]

        # Transfer the asset by updating the respective balances. Note that we
        # are operating on the wrapper database, not the db instance.
        from_bal = txn_acct_bal[from_acct]
        txn_acct_bal[to_account] = from_bal + amount
        txn_acct_bal[from_account] = from_bal - amount

        # Log the transaction in the transaction_log database. Again, we use
        # the wrapper for the database:
        txn_log[from_account, to_account, get_timestamp()] = amount

Multiple transactions are allowed to be open at the same time, but if there are conflicting changes, an exception will be thrown when attempting to commit the offending transaction:

# Create a basic k/v store. Schema.key_value() is a convenience/factory-method.
kv = env.add_database('main', Schema.key_value())

# ...

# Instead of using the context manager, we'll call begin() explicitly so we
# can show the interaction of 2 open transactions.
txn = env.transaction().begin()

t_kv = txn[kv]
t_kv['k1'] = 'v1'

txn2 = env.transaction().begin()
t2_kv = txn2[kv]

t2_kv['k1'] = 'v1-x'

txn2.commit()  # ERROR !!
# SophiaError('txn is not finished, waiting for concurrent txn to finish.')

txn.commit()  # OK

# Try again?
txn2.commit()  # ERROR !!
# SophiaError('transasction rolled back by another concurrent transaction.')

Index types, multi-field keys and values

Sophia supports multi-field keys and values. Additionally, the individual fields can have different data-types. Sophy provides the following field types:

  • StringIndex - stores UTF8-encoded strings, e.g. text.
  • BytesIndex - stores bytestrings, e.g. binary data.
  • JsonIndex - stores arbitrary objects as UTF8-encoded JSON data.
  • MsgPackIndex - stores arbitrary objects using msgpack serialization.
  • PickleIndex - stores arbitrary objects using Python pickle library.
  • UUIDIndex - stores UUIDs.
  • U64Index and reversed, U64RevIndex
  • U32Index and reversed, U32RevIndex
  • U16Index and reversed, U16RevIndex
  • U8Index and reversed, U8RevIndex
  • SerializedIndex - which is basically a BytesIndex that accepts two functions: one for serializing the value to the db, and another for deserializing.

To store arbitrary data encoded using msgpack, you could use MsgPackIndex:

schema = Schema(StringIndex('key'), MsgPackIndex('value'))
db = sophia_env.add_database('main', schema)

To declare a database with a multi-field key or value, you will pass the individual fields as arguments when constructing the Schema object. To initialize a schema where the key is composed of two strings and a 64-bit unsigned integer, and the value is composed of a string, you would write:

key = [StringIndex('last_name'), StringIndex('first_name'), U64Index('area_code')]
value = [StringIndex('address_data')]
schema = Schema(key_parts=key, value_parts=value)

address_book = sophia_env.add_database('address_book', schema)

To store data, we use the same dictionary methods as usual, just passing tuples instead of individual values:

sophia_env.open()

address_book['kitty', 'huey', 66604] = '123 Meow St'
address_book['puppy', 'mickey', 66604] = '1337 Woof-woof Court'

To retrieve our data:

huey_address = address_book['kitty', 'huey', 66604]

To delete a row:

del address_book['puppy', 'mickey', 66604]

Indexing and slicing works as you would expect.

Note: when working with a multi-part value, a tuple containing the value components will be returned. When working with a scalar value, instead of returning a 1-item tuple, the value itself is returned.

Configuring and Administering Sophia

Sophia can be configured using special properties on the Sophia and Database objects. Refer to the configuration document for the details on the available options, including whether they are read-only, and the expected data-type.

For example, to query Sophia's status, you can use the status property, which is a readonly setting returning a string:

print(env.status)
"online"

Other properties can be changed by assigning a new value to the property. For example, to read and then increase the number of threads used by the scheduler:

nthreads = env.scheduler_threads
env.scheduler_threads = nthread + 2

Database-specific properties are available as well. For example to get the number of GET and SET operations performed on a database, you would write:

print(db.stat_get, 'get operations')
print(db.stat_set, 'set operations')

Refer to the documentation for complete lists of settings. Dotted-paths are translated into underscore-separated attributes.

sophy's People

Contributors

coleifer avatar pmwkaa avatar romannekhor avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sophy's Issues

Unable to create new database. KeyError: 'db-test'

sophy version is current git HEAD. Python 3.4.3

This error occurs when I try to create a new database or run the test suite.

sophy git:(master) ✗ python tests.py
Exception ignored in: 'sophy._BaseDBObject.open'
MemoryError: Unable to allocate object: <sophy.Database object at 0x7fb40aa92348>.
EException ignored in: 'sophy._BaseDBObject.open'
MemoryError: Unable to allocate object: <sophy.Database object at 0x7fb40aaacd68>.

........(lots of these errors)

ERROR: test_view (__main__.TestU32Index)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "tests.py", line 40, in setUp
    super(BaseSophiaTestMethods, self).setUp()
  File "tests.py", line 21, in setUp
    self.db = self.sophia[DB_NAME]
  File "sophy.pyx", line 240, in sophy.Sophia.__getitem__ (sophy.c:6545)
    return self.dbs[name]
KeyError: 'db-test'

----------------------------------------------------------------------
Ran 39 tests in 0.076s

FAILED (errors=39)

Can't create new environment

When I run the following code:

import sophy
# assumes that /tmp/foo does not exist before the script is run
env = Sophia('/tmp/foo')
db = env.create_database('mydb', 'string')

I get the following error:

Exception ignored in: 'sophy.Sophia.open'
Traceback (most recent call last):
  File "sophy.pyx", line 154, in sophy._ConfigManager.apply_all (sophy.c:6382)
  File "sophy.pyx", line 150, in sophy._ConfigManager.apply_config (sophy.c:6215)
  File "sophy.pyx", line 73, in sophy._check (sophy.c:3704)
Exception: sophia/runtime/sr_conf.c:339 bad configuration path: db.b'foo'.mmap

Is there any configuration I should do?

Closing database safely

Not sure if there is a special method in the API (i didn't find one) but i do have some issues (segfaults to be accurate) when i try to close sophia instance through close method defined here https://github.com/coleifer/sophy/blob/master/sophy.pyx#L545

This issue came out of a problem with Flask's code reloader which opened too many files resulting in
Exception Exception: Exception("sophia/log/sl.c:61 log file '' open error: Too many open files",) in 'sophy.Sophia.open' ignored

Import Error undefined symbol: PyString_FromStringAndSize

Looks like there is a python 3 issue.

Using Python 3.4.3

In [1]: import sophy
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-1-75851a3a749b> in <module>()
----> 1 import sophy

ImportError: /home/vagrant/miniconda/envs/env/lib/python3.4/site-packages/sophy.cpython-34m.so: undefined symbol: PyString_FromStringAndSize

Install logs:

Collecting Cython
Collecting sophy
  Downloading sophy-0.1.2.tar.gz (297kB)
    100% |████████████████████████████████| 299kB 765kB/s
Building wheels for collected packages: sophy
  Running setup.py bdist_wheel for sophy
  Stored in directory: /home/vagrant/.cache/pip/wheels/ed/72/4d/36acd1b75237dd93ebf51be25bea18d011dd72ad71fe6a8751
Successfully built sophy
Installing collected packages: Cython, sophy
Successfully installed Cython-0.23.4 sophy-0.1.2

How to re-open existing database?

Creating the database works fine using the example given at https://sophy.readthedocs.io/en/latest/api.html#Database , and it's stored to disk. And i can read values to and from it, at that moment.

But, when using get_database only, commenting out the add_database line, wanting to just reopen the stored existing database afterwards without overwriting it with a new (and then empty) one, i get this:

kv_db = env.get_database('kv') File "sophy.pyx", line 251, in sophy.Sophia.get_database KeyError: 'kv'

Is there something i'm doing incorrectly or have misunderstood?

My 'env' folder looks like this:

test/kv
test/kv/00000000000000000001.db
test/kv/scheme

Any help or tips are appreciated.

Querying a part of index and value?

I wonder if there is possibility to query a part of complex indexes or even query for value?
Suppose we have a schema:

schema = Schema(key_parts=[StringIndex('key'), StringIndex('predicate')],
value_parts=[StringIndex('value')])

Since we have defined StringIndex for value and for key and predicate, can i query for a part of index or for specific value?

Python3 ?

Any estimate of when Python3 support will be happening?

Segfault

Just installed sophy using the git repo. Created a simple database and exited resulting in a segfault. Ran again and database appeared OK, but segfaulted again on exit. OS X El Capitan.

ipython
Python 2.7.10 (default, Oct 23 2015, 18:05:06)
Type "copyright", "credits" or "license" for more information.

IPython 4.0.3 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: from sophy import Sophia

In [2]: env = Sophia('/tmp/sophia-env', [('test-db', 'string')])

In [3]: db = env['test-db']

In [4]: env.open()
Out[4]: False

In [5]: db = env['test-db']

In [6]: db['k3']
Out[6]: '5'

In [7]: list(db[::True])
Out[7]: [('k3', '5'), ('k2', 'v2'), ('k1', 'v1')]

In [8]: env.close()
Out[8]: True

In [9]:
Do you really want to exit ([y]/n)?
Segmentation fault: 11

unbound memory problem

Origin: https://groups.google.com/forum/#!topic/sophia-database/nX5HbdF7mcE

Using sophy bindings and sophia 2.1 I try to load a several Gb file into a database but even after settings the memory limit the consumption grows without limit.

What I am doing wrong?

Here's more or less the code I use:

            from sophy import Sophia
            from msgpack import loads, dumps

            env = Sophia(db, auto_open=False)
            env.memory_limit = 1024 * 1024 * 1024
            env.open()

            posts = env.create_database('Post', 'u32')

            filepath = os.path.join('dump', 'Posts.xml')
            for data in iterate(filepath):
                 posts[int(data['Id'])] = dumps(data)
            env.close()

Can we please check that memory.limit is set correctly?
Thanks!

Segmentatoin fault when key is U32Index

Hello. I trying to create database with U32Index as a key.

from sophy import Sophia, Schema, BytesIndex, U32Index

env = Sophia('/tmp')

schema = Schema(
    key_parts=[U32Index('key')],
    value_parts=[BytesIndex('value')]
)

db = env.add_database('test_num_key', schema)

if not env.open():
    raise RuntimeError('Unable to open Sophia environment.')

but interpreter fails.

BTW: cool binding, great work!

all the value i get from sophy is None while key was correct

here is my test.py content

#!/usr/bin/env python
# coding: utf-8

from sophy import *
from random import randint, choice

CS = 'abcdefghijklmnopqrstuvwxyz'
genstr = lambda n=4: ''.join(choice(CS) for idx in xrange(n))


def random_insert():
        env  = Sophia('./data/cache')
        schema = Schema(key_parts=[StringIndex('key')], value_parts=[StringIndex('val')])
        db = env.add_database('poi', schema)


        env.open()
        for idx in xrange(10):
                k, v = genstr(6), str(randint(0, 9999))
                db[k] = v
        env.close()

def show():
        env  = Sophia('./data/cache')
        schema = Schema(key_parts=[StringIndex('key')], value_parts=[StringIndex('val')])
        db = env.add_database('poi', schema)


        env.open()
        for k,v in db:
                print k,v

        env.close()

show()


and this is the output of that show()

ubuntu@devops:/data_ext/jboard$ python ./test.py 
a None
cksbtp None
hixdik None
iogfbz None
mufzys None
rjqnyt None
rrujgz None
sxbwno None
ueulqi None
xbfeff None
yzouqe None

writing, closing, and then reopening a Sophia db in one Python session results in segfault on exit

If I create and close a Sophia db and then try to open and read from it in the same Python session, Python segfaults on exit (i.e., after printing the db contents in the example below). Creating the db in one session and then opening and reading from it in another succeeds without any segfault. I'm using bddf3da with Python 2.7.12 on MacOS 10.11.6:

import sophy

# assumes that /tmp/foo does not exist before the script is run
env = sophy.Sophia('/tmp/foo', [('foo', 'string')])
db = env['foo']
db['a'] = 'xxx'
db['b'] = 'yyy'
db['c'] = 'zzz'
env.close()

env = sophy.Sophia('/tmp/foo', [('foo', 'string')])
db = env['foo']
for (k, v) in db:
    print k, v
env.close()

Encoding problems?

Hi,

I used virtualenv to set up a Python 3.4.3; then did "pip install cython", cloned master branch of sophy (commit 2cac2f5 as HEAD), followed by successful "python setup.py build" and "python setup.py install".

When running "python tests.py", all tests fail due to:
Exception: sophia/runtime/sr_conf.c:339 bad configuration path: db.b'db-test'.mmap

Full output (all test cases fail in the same way):

ERROR: test_version (main.TestConfiguration)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "tests.py", line 20, in setUp
    self.sophia = self.create_env()
  File "tests.py", line 29, in create_env
    return Sophia(TEST_DIR, [(DB_NAME, self._index_type)])
  File "sophy.pyx", line 208, in sophy.Sophia.init (sophy.c:6464)
    self.open()
  File "sophy.pyx", line 226, in sophy.Sophia.open (sophy.c:6733)
    self.config.apply_all()
  File "sophy.pyx", line 159, in sophy._ConfigManager.apply_all (sophy.c:5877)
    self.apply_config(key, value)
  File "sophy.pyx", line 155, in sophy._ConfigManager.apply_config (sophy.c:5707)
    _check(self.sophia.handle, rc)
  File "sophy.pyx", line 78, in sophy._check (sophy.c:3297)
    raise Exception(error)
Exception: sophia/runtime/sr_conf.c:339 bad configuration path: db.b'db-test'.mmap

Same b'db-test' appears when building from the tag 0.1.6.

Not sure what causes this. Looks like something with encoding/decoding... maybe whatever is encoded must be decoded when used later for string concatenation (and subsequent re-encode).

I also wonder if this is somehow an issue with locales. Mine are set to sv_SE.UTF-8.

How to limit the memory used by sophia

Héllo,

I'd like to use sophia for my sotoki project to replace sqlite, pgsql or wiredtiger.

Here is my code:

            from sophy import Sophia
            from msgpack import loads, dumps

            env = Sophia(db, auto_open=False)
            env.memory_limit = 1024 * 1024 * 1024
            env.open()

            posts = env.create_database('Post', 'u32')


            filepath = os.path.join(dump, klass.filename())
            for data in iterate('dump'):
                 posts[int(data['Id'])] = dumps(data)
            for key, data in posts.cursor():
                uid = key
                values = loads(data)
            env.close()

But sophia seem to use all the memory available even if I've set a memory limit. What I am doing wrong?

Project License

The sophia lisence is BSD. But I didn't find out any license mention for the sophy.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.