Giter Club home page Giter Club logo

btrdb-python's Introduction

BTrDB Bindings for Python

These are BTrDB Bindings for Python allowing you painless and productive access to the Berkeley Tree Database (BTrDB). BTrDB is a time series database focusing on blazing speed with respect to univariate time series data at the nanosecond scale.

Sample Code

Our goal is to make BTrDB as easy to use as possible, focusing on integration with other tools and the productivity of our users. In keeping with this we continue to add new features such as easy transformation to numpy arrays, pandas Series, etc. See the sample code below and then checkout our documentation for more in depth instructions.

import btrdb

# connect to database
conn = btrdb.connect("192.168.1.101:4410")

# view time series streams found at provided collection path
streams = conn.streams_in_collection("USEAST_NOC1/90807")
for stream in streams:
    print(stream.name)

# retrieve a given stream by UUID and print out data points.  In value
# queries you will receive RawPoint instances which contain a time and value
# attribute
stream = conn.stream_from_uuid("71466a91-dcfe-42ea-9e88-87c51f847942")
for point, _ in stream.values(start, end):
    print(point)
>> RawPoint(1500000000000000000, 1.0)
>> RawPoint(1500000000100000000, 2.0)
>> RawPoint(1500000000200000000, 3.0)
...

# view windowed data.  Each StatPoint contains the time of the window and
# common statistical data such as min, mean, max, count, and standard
# deviation of values the window covers.  See docs for more details.
width = 300000000
depth = 20
for point, _ in stream.windows(start=start, end=end,
                               width=width, depth=depth):
>> StatPoint(1500000000000000000, 1.0, 2.0, 3.0, 3, 0.816496580927726)
>> StatPoint(1500000000300000000, 4.0, 5.0, 6.0, 3, 0.816496580927726)
>> StatPoint(1500000000600000000, 7.0, 8.0, 9.0, 3, 0.816496580927726)

You can also easily work with a group of streams for when you need to evaluate data across multiple time series or serialize to disk.

from btrdb.utils.timez import to_nanoseconds

start = to_nanoseconds(datetime(2018,1,1,9,0,0))
streams = db.streams(*uuid_list)

# convert stream data to numpy arrays
data = streams.filter(start=start).to_array()

# serialize stream data to disk as CSV
streams.filter(start=start).to_csv("data.csv")

# convert stream data to a pandas DataFrame
streams.filter(start=start).to_dataframe()
>>                    time  NOC_1/stream0  NOC_1/stream1
    0  1500000000000000000            NaN            1.0
    1  1500000000100000000            2.0            NaN
    2  1500000000200000000            NaN            3.0
    3  1500000000300000000            4.0            NaN
    4  1500000000400000000            NaN            5.0
    5  1500000000500000000            6.0            NaN
    6  1500000000600000000            NaN            7.0
    7  1500000000700000000            8.0            NaN
    8  1500000000800000000            NaN            9.0
    9  1500000000900000000           10.0            NaN

Installation

See our documentation on installing the bindings for more detailed instructions. However, to quickly get started using the latest available versions you can use pip to install from pypi with conda support coming in the near future.

$ pip install btrdb

Tests

This project includes a suite of automated tests based upon pytest. For your convenience, a Makefile has been provided with a target for evaluating the test suite. Use the following command to run the tests.

$ make test

Aside from basic unit tests, the test suite is configured to use pytest-flake8 for linting and style checking as well as coverage for measuring test coverage.

Note that the test suite has additional dependencies that must be installed for them to successfully run: pip install -r tests/requirements.txt.

Releases

This codebase uses github actions to control the release process. To create a new release of the software, run release.sh with arguments for the new version as shown below. Make sure you are in the master branch when running this script.

./release.sh 5 11 4

This will tag and push the current commit and github actions will run the test suite, build the package, and push it to pypi. If any issues are encountered with the automated tests, the build will fail and you will have a tag with no corresponding release.

After a release is created, you can manually edit the release description through github.

Documentation

The project documentation is written in reStructuredText and is built using Sphinx, which also includes the docstring documentation from the btrdb Python package. For your convenience, the Makefile includes a target for building the documentation:

$ make html

This will build the HTML documentation locally in docs/build, which can be viewed using open docs/build/index.html. Other formats (PDF, epub, etc) can be built using docs/Makefile. The documentation is automatically built on every GitHub release and hosted on Read The Docs.

Note that the documentation also requires Sphix and other dependencies to successfully build: pip install -r docs/requirements.txt.

Versioning

This codebases uses a form of Semantic Versioning to structure version numbers. In general, the major version number will track with the BTrDB codebase to transparently maintain version compatibility. Planned features between major versions will increment the minor version while any special releases (bug fixes, etc.) will increment the patch number.

btrdb-python's People

Contributors

aorticweb avatar bbengfort avatar davidkonigsberg avatar emptyflash avatar filipjanitor avatar immesys avatar looselycoupled avatar mchestnut91 avatar murphsp1 avatar oridb avatar samkumar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

btrdb-python's Issues

function naming does not conform to format standards

This issue should be debated.

The method is refreshMeta() whereas the GoLang version is RefreshMeta(). In Python, this should be refresh_meta(). It would be "more pythonic" to follow the formatting standards. Should the python library conform to python formatting standards or try to match the Go Lang version?

There are some other issues with spacing that are probably just being a bit nitpicky.

Stream search functionality

Search functionality for streams is currently less than ideal.

It would be good to have wildcard or more general purpose string matching available. This could be built into the Python library or could be built into BTrDB. It is probably better to put it into BTrDB.

stream.nearest() returns error when no point is found

If you pick a point and then go the wrong way such that there is no data in that direction, BTrDB returns

BTrDBError [401] no such point

This should not be the case. It would be much smarter to return a None and have the API method catch the error and return none.

problems with parameter types

Variable types is a bit unforgiving in the library.

Nanoseconds

For example - nanoseconds must be in integers. We could simply cast nanosecond parameters as integers within the api methods instead of throwing errors. There could be a performance penalty for doing this.

String encoding as Bytes

There are also some instances where the user has to take a string and turn it into a sequence of bytes in a particular character encoding. This can easily be done inside the API using either of the following to generate b'bytes':

u'something'.encode('utf-8')
bytes(u'something', 'utf-8')

listCollections() method does not work.

The colls parameter is apparently used for pagination.

def listCollections(self, prefix):
        colls = (prefix,)
        maximum = 10
        got = maximum
        while got == maximum:
            startingAt = colls[-1]
            colls = self.ep.listCollections(prefix, colls, maximum)
            for coll in colls:
                yield coll
            got = len(colls)

However, Python always throws a Type error

TypeError: ('blahblahblah',) has type <class 'tuple'>, but expected one of: (<class 'bytes'>, <class 'str'>) for field ListCollectionsParams.startWith

better tooling for handling Nanoseconds

In general, handling nanoseconds in Python is not terribly pleasant. It would make sense to create some tooling or helper classes around this issue. As time stamps in nanoseconds are incredibly large, it is easy to screw up the order of magnitude?

Do we use datetime? Time? Arrow? Pandas?

Some possible helper functions are below.

def ConvertDateTimeToEpoch_ns(dt):
    """
    converts the datetime object to epoch time in nanoseconds
    """
    
    return (int(dt.strftime('%s'))*1e6 + dt.microsecond)*1e3

def ConvertEpoch_nsToDateTime(ns):
    """
    converts the Epoch time in nanoseconds into a datetime object
    """
    
    return (datetime.fromtimestamp(ns/1e9).strftime('%c'))

It would be nice to be able to decrement by an "hour" or "minute" or "day"

stream.changes() is not fully implemented

This method is not fully implemented

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-22-c0d61ae0f3fe> in <module>()
----> 1 for x in s.changes(v-1, v, 1):
      2     print(x)

~/anaconda/lib/python3.6/site-packages/btrdb4-4.4.5-py3.6.egg/btrdb4/__init__.py in changes(self, fromVersion, toVersion, resolution)
    428         for crlist, version in crs:
    429             for cr in crlist:
--> 430                 yield ChangeRange.fromProto(cr), version
    431 
    432     def flush(self):

NameError: name 'ChangeRange' is not defined

Import fails if `HOME` not set

In cases where HOME is not set (e.g. in our case, when btrdb is imported in a webapp controlled by supervisorctl), importing btrdb fails at utils/credentials.py:30

CREDENTIALS_PATH = os.path.join(os.environ["HOME"], CONFIG_DIR, CREDENTIALS_FILENAME)

This can be reproduced as follows:

# bash
unset HOME

then

# in python
import btrdb

Which produces:

In [1]: import btrdb                                                                                                                                                   
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-1-36165869e3ee> in <module>
----> 1 import btrdb

~/Preteckt/tools/preteckt_dash/env/lib/python3.6/site-packages/btrdb/__init__.py in <module>
     20 from btrdb.exceptions import ConnectionError
     21 from btrdb.version import get_version
---> 22 from btrdb.utils.credentials import credentials_by_profile, credentials
     23 from btrdb.stream import MINIMUM_TIME, MAXIMUM_TIME
     24 

~/Preteckt/tools/preteckt_dash/env/lib/python3.6/site-packages/btrdb/utils/credentials.py in <module>
     28 CONFIG_DIR = ".predictivegrid"
     29 CREDENTIALS_FILENAME = "credentials.yaml"
---> 30 CREDENTIALS_PATH = os.path.join(os.environ["HOME"], CONFIG_DIR, CREDENTIALS_FILENAME)
     31 
     32 ##########################################################################

/usr/lib/python3.6/os.py in __getitem__(self, key)
    667         except KeyError:
    668             # raise KeyError with the original key value
--> 669             raise KeyError(key) from None
    670         return self.decodevalue(value)
    671 

KeyError: 'HOME'

There is of course a workaround to set the HOME var by hand, but it seems like this shouldn't be a requirement.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.