exxeleron / qpython Goto Github PK

View Code? Open in Web Editor NEW

151.0 46.0 90.0 339 KB

interprocess communication between Python and kdb+

Home Page: http://www.devnet.de

License: Apache License 2.0

Shell 0.14% Python 99.83% Batchfile 0.03%

qpython's People

Contributors

Stargazers

Watchers

qpython's Issues

metadata gets dropped during copy

Noticed that the metadata was getting dropped during a copy of the object.

q('qlst:1 2 3 4 5')
test = q('qlst') # QList([1, 2, 3, 4, 5])
print test.meta # metadata(adjust_dtype=False, qtype=-7)

test2 = numpy.copy(test)
print test2 # QList([1, 2, 3, 4, 5])
print test2.meta
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-287-74313d31bde4> in <module>()
      2 print test2
      3 print test.meta
----> 4 print test2.meta

AttributeError: 'QList' object has no attribute 'meta'

test2 = test.__copy__()
print test2 # QList([1, 2, 3, 4, 5])
print test2.meta
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-287-74313d31bde4> in <module>()
      2 print test2
      3 print test.meta
----> 4 print test2.meta

AttributeError: 'QList' object has no attribute 'meta'

Same thing happens with QTable's

Configure tox for testing

Create tox configuration for convenient testing against multiple Python versions.

Error while data decompression

My q query can be executed correctly in a q session. But in qPython, sometimes I receive "Error while data decompression", while other times I can query correctly.
What is the meaning of this error?

Trouble executing commands using conn.query()

Sample code:

qc = QConnection(host='localhost', port=5000).open()
# this does not work. tbl is not created as an empty table on the server
qc.query(MessageType.SYNC, "tbl: ([] col1: (); col2: ())") 

# this one does work, and creates x on the server
qc.query(MessageType.SYNC, "x: 12345")

Asyncio-friendly non-blocking I/O

Hello,

Is it possible to use qPython with asyncio without resorting to things like run_in_executor (i.e. ThreadPoolExecutor)? Both synchronous sync and asynchronous receive calls are blocking, making it impossible to use with asyncio (without running in a separate thread).

Ideally, I would like to do something like the following. Note that I've made up methods sync_async (yeah, not a good name) and receive_async:

async with qconnection.QConnection(...) as q:
    result = await q.sync_async(query, **query_kwargs)

and/or

async with qconnection.QConnection(...) as q:
    q.async(query)
    result = await q.receive_async()

In fact, there'd be no need in two methods (?) - we just need a method returning a future which we could await:

async with qconnection.QConnection(...) as q:
    future = q.async(query)
    result = await future

Another related problem I currently face is that all socket reads are blocking, which makes my program very slow (I want to read multiple large results concurrently). I'm not certain of technical details, but it seems like it should be possible to do non-blocking concurrent socket I/O (e.g. asyncore seems like it does the job?).

Thanks!

Closing a QConnection does not free the system resource

Steps for Reproduction:

Create a connection to a Q process from python, then close it:

c = QConnection(host='localhost',port=5000)
c.open()
c.close()

On the Q process, inspect .z.W, the socket will still be open.

Issue: The file handle associated with the Reader object has not been closed. As is outlined here: https://docs.python.org/3.5/library/socket.html#socket.socket.close The file handle associated with the socket is not released until all file objects from makefile are closed.

Remediation step:

c._reader._stream.close()

Suggested change:

Move the creation of the file object to QConnection._init_socketand also ensure it is closed in QConnection.close

Version Used: Python 3.5.1 and QPython 1.2

Buffer getting replayed somewhere?

I'm not clear about what exactly causes this but this is how to reproduce:
When executing a query via QConnection.sync(), if it gets interrupted (e.g. via ctrl+c), and if you then run another query, the QConnection handle will return the results for the previous query.
example:

qc = qconnection.QConnection(host, port=port, username=username)
qc.open()
res1 = qc('select * from trade where sid=1', pandas=True)
res2 = qc('select * from trade where sid=2', pandas=True) (interrupt with ctrl+c)

then run again:
res1 = qc('select * from trade where sid=1', pandas=True)
res2 = qc('select * from trade where sid=2', pandas=True)

res1 will have the results for res2 OR res2 will have results for res1, or both

this is true for qpython version '1.1.0b3' and pandas version '0.16.2' running on linux

if you do qc.close(), qc.open() before each call to sync(), the problem disappears.

Timezone issues

It seems that np.datetime64 assumes UTC when KDB is showing a local timestamp. Any way to get around this?

In [67]: conn.sync("string .z.Z")
Out[67]: '2015.08.25T18:03:28.368'

In [69]: print conn.sync(".z.Z")
2015-08-25T14:03:36.137-0400 [metadata(qtype=-15)]

Passing more then 8 parameters, results in "too many parameters" error?

Is there a workaround for passing more than 8 parameters?

Unable to get repr for <class 'main.PublisherThread'>

Hi, Masters:
I'm trying to do the test with qpython connects to TickerPlant. the script "publisher.py" demonstrates the function.
But while I running it. I got "Unable to get repr for <class 'main.PublisherThread'>" message. Do you know what's it caused and how to fix it?

Thanks
Zheng

async is a reserved word in Python 3.7

from https://docs.python.org/3.8/whatsnew/3.7.html?highlight=reserved%20words:

Backwards incompatible syntax changes:
async and await are now reserved keywords.

This prevents us importing qpython as it complains that

def async(self, query, *parameters, **options):

is invalid syntax.

Decode bytes to str with Pandas reader

It would be nice if there was a setting which allowed to get data as str, rather than bytes, in Python 3 when used with pandas=True. I'm aware of the discussion in #35 and related example of a custom reader here, so it's definitely possible to achieve this. It just seems like there are quite a few things you need to overwrite to handle all the cases, which makes it very easy to do it wrong. If this is not going to be officially supported out of the box, then at least a recipe with QReader implementation which handles all the possible cases would be extremely helpful.

I feel like Python 3 users would prefer to get str over bytes in vast majority of cases, and using pandas DataFrame is essential in most scientific applications. If anything, at first I was naively expecting that passing encoding to QConnection would make it do exactly that. So if you don't pass any encoding it would give you bytes, but if you do pass the encoding it would give you decoded strings.

Thanks!

SSL/TLS support/Upgrade

Hi,

Not an issue, but just curious if qpython can support tcps connection?

h:hopen :tcps://hostname:port[:username:password]

I would like to connect to a TLS enabled q server from a windows client that has open_ssl installed. If its not already supported is it possible to wrap the socket connection in a TLS wrapper?

Thanks

incorrect handling of tables containing char columns

qPython raises an error while deserializing table containing char column.

Test case:

flip `name`iq`grade!(`Dent`Beeblebrox`Prefect;98 42 126;"a c")

Table meta:

c    | t f a
-----| -----
name | s
iq   | j
grade| c

Results in:

Traceback (most recent call last):
  File "D:\dev\workspace\qPython\samples\console.py", line 39, in <module>
    result = q(x)
  File "D:\dev\workspace\qPython\qpython\qconnection.py", line 174, in __call__
    return self.sync(parameters[0], *parameters[1:])
  File "D:\dev\workspace\qPython\qpython\qconnection.py", line 146, in sync
    response = self.receive(data_only = False)
  File "D:\dev\workspace\qPython\qpython\qconnection.py", line 169, in receive
    result = self._reader.read(raw)
  File "D:\dev\workspace\qPython\qpython\qreader.py", line 101, in read
    message.data = self.read_data(message.size, raw, message.is_compressed)
  File "D:\dev\workspace\qPython\qpython\qreader.py", line 151, in read_data
    return raw_data if raw else self._read_object()
  File "D:\dev\workspace\qPython\qpython\qreader.py", line 160, in _read_object
    return reader(self, qtype)
  File "D:\dev\workspace\qPython\qpython\qreader.py", line 268, in _read_table
    return qtable(columns, data, qtype = QTABLE)
  File "D:\dev\workspace\qPython\qpython\qcollection.py", line 126, in qtable
    table = numpy.core.records.fromarrays(data, names = ','.join(columns))
  File "C:\Python27\lib\site-packages\numpy\core\records.py", line 562, in fromarrays
    raise ValueError("array-shape mismatch in array %d" % k)
ValueError: array-shape mismatch in array 2

Dump QTable

Hello,

I am trying to use qpython for the first time. One of my use cases is to run a query and dump data to csv file. Is there an easy way to do so? I can write individual rows, but how do I get column names.

Many thanks.

Twisted Integration

Hi there!

I'm currently trying to get the Twisted integration example you provided to work, but I', running into a few issues.
First of all: when I'm trying to connect to kdb the line self.transport.write(self.credentials + '\3\0') throws an TypeError telling me the "Data must not be unicode". So I tried changing it to self.transport.write(str(self.credentials + '\0').encode("utf-8")) (not the most sexy solution, I know) and the error disappeared... But as it usually goes, another showed up.
This time a TypeError with cause "not all arguments converted during string formatting" popped up, and I do not seem to be able to find where It is coming from. Is this about data that's coming from kdb that is not being converted, or outgoing data?

The kdb database I'm currently working with has no users so I did not provide a username and password --> factory = IPCClientFactory('', '', onConnectSuccess, onConnectFail, onMessage, onError).

Could that be in the way of succes? I'm a bit clueless..

Thanks in advance!

pandas DataFrame drops time index when `q('set', numpy.string_('varname'), df)

When trying to pass from Pandas to q, the time index in DataFrame (in the column Date) doesn't quite make it to the q side:

import pandas.io.data as web
import datetime
import numpy
import qpython.qconnection as qconnection # requires installation of qPython module from https://github.com/exxeleron/qPython

start = datetime.datetime(2010, 1, 1)
end = datetime.datetime(2015, 2, 6)
f=web.DataReader("F", 'yahoo', start, end) # download Ford stock data (ticker "F") from Yahoo Finance web service
f.ix[:5]  # explore first 5 rows of the DataFrame
# Out:
#             Open  High  Low  Close    Volume  Adj Close
#    Date
#2010-01-04 10.17 10.28 10.05 10.28  60855800       9.43 
#2010-01-05 10.45 11.24 10.40 10.96 215620200      10.05
#2010-01-06 11.21 11.46 11.13 11.37 200070600      10.43
#2010-01-07 11.46 11.69 11.32 11.66 130201700      10.69
#2010-01-08 11.67 11.74 11.46 11.69 130463000      10.72

q = qconnection.QConnection(host = 'localhost', port = 5000, pandas = True) # define connection interface parameters. Assumes we have previously started q server on port 5000 with `q.exe -p 5000` command
q.open() # open connection
q('set', numpy.string_('yahoo'), f) # pass DataFrame to q table named `yahoo`
q('5#yahoo') # display top 5 rows from newly created table on q server 
# Out:
#    Open  High  Low  Close    Volume  Adj Close
#0 10.17 10.28 10.05 10.28  60855800       9.43 
#1 10.45 11.24 10.40 10.96 215620200      10.05
#2 11.21 11.46 11.13 11.37 200070600      10.43
#3 11.46 11.69 11.32 11.66 130201700      10.69
#4 11.67 11.74 11.46 11.69 130463000      10.72

SO thread: http://stackoverflow.com/questions/28385137/pandas-dataframe-drops-index-when-passing-to-kdb-using-qpython-api

By all means, thank you for excellent qPython package!

Fatal socket timeout with any sync/async call

On behalf of David Roberts as reported in https://groups.google.com/forum/#!topic/exxeleron/yDTruQw8wvw

qwriter does not flush buffers resulting in any sync or async query being able to stall.

Line 80 in qwriter needs to be sendall, not send or python does not dispatch the whole buffer and flush.

To reproduce, run on a linux server with default network buffers a few ms distant from the server - windows socket stack doesn't seem to exhibit the same behaviour:

def id_generator(size=6, chars=string.ascii_uppercase + string.digits):
    return ''.join(random.choice(chars) for _ in range(size))

text=[]

for a in range(1000):
    text.append(id_generator(100))

q = qconnection.QConnection()
q.open()
print q("1b")
print q.sync("{count x}",qlist(range(100000), qtype=qtype.QINT_LIST))
print q.sync("{count x}",qlist(text, qtype=qtype.QSYMBOL_LIST))
print q("1b")

Allow composition in QReaders

Hi,
I once open an issue about Strings in qPyton that are returned ad bytes and not unicode strings in python 3.
#35

The issue was closed after some improvement to the QReader were implemented.

I think there is still room for improvement as even with the current code I struggle to have override the behaviour.
The problem is that I want to use PandasQReader.

Now, PandasQReader inherits from QReader, so I cant really inject my class in between, I would have to copy PandasQReader so it inherits from my class.

This is similar to the stream framework in Java. There each stream class introduces some new behaviour and the way to make them work together is by putting one inside the other, rather than make them inherit from a particular one.

So if PandasQReader took a QReader in the constructor I could pass at runtime my modified class as opposed to the default one.

Unless I have missed something.

Pandas integration bug: Python3 bytes saves to KDB as symbol

When reading from KDB to Python 3 pandas dataframe, as document says:
symbol (KDB) => numpy.bytes_ (python)
string (KDB) => bytes (python)
However, I found symbol is also read as bytes with pandas=True

More importantly, when write a pandas dataframe to KDB, I tested the type conversion:
str (python) => symbol (KDB)
numpy.bytes_(python) => symbol (KDB)
bytes (python) => symbol (KDB)

If in a list, the bytes (python) is correctly converted to string (KDB).
But the bug shows for pandas dataframe.

remote invocation of builtin functions

Hi - not an issue, just a support request but I do not see details of support.

Let's say I want to call a remote function from python and pass it a dictionary. Inside the dict I have an entry 'starttime' which a pass a QTemporal for, all good.

Now I do not want to pass a concrete time, I want to pass .z.p i.e. have the server use the current time. Is that possible in qpython?

Thanks.

Needs utility for casting from qtemporals (or raw temporals) into datetimes

Not sure if this is even possible given the IPC protocol used by kdb, but is there a way to transparently cast from q's temporals into numpy datetimes? Preferably, a flag in QConnection.query or a global config that can be set that would automatically cast a query's results which are temporal to numpy datetimes.

I've made a flask-kdb extension using qpython, which for the record was a dream to work with (qpython and flask). To test out it's design, I made an in browser REPL link.

In doing so, the biggest issue I've had is the rendering of tables with datetimes in them:

For qtemporal lists, there's no clean way to convert them. The current solution I've found for temporal lists link to code is little more than a hack.
For tables with temporal value columns in them, there doesn't seem to be any way to know that the column is a temporal. I'm hoping to avoid a naming convention for column names hack to sneak the data in. Even if there were, I'd still be stuck with my temporal list hack.

I'd be welcome to some comments on what to do to resolve this. My current thinking is, if it's a kdb thing, the utility to convert the column could be as simple as running a \t <whatever> to get the metadata I need.

Feel free to close this issue if this is a purely kdb thing that you can't fix.

Sending Unicode strings as QSTRING type

On Python 2.7: I'm trying to send non-ASCII (utf-8) data to a q process. It works fine when sending the data as a QSYMBOL (using str.encode('utf-8')), but always fails when trying to send the data as a QSTRING.

If I send the data as a unicode string I get:
QWriterException: Unable to serialize type: <type 'unicode'>
in line 119 of qwriter.py.

If I convert the unicode string to a str type (using str.encode('utf-8')) I get:
UnicodeDecodeError: 'ascii' codec can't decode byte [...]: ordinal not in range(128).
in line 167 of qwriter.

Setting encoding to 'utf-8' when initialising the connection doesn't help either.
Do I need to write a custom _write_string override to get this to work properly?

QSYMBOL as unicode string in Python 3

Hi,

I am using python 3 and when I query my employer kdb server I get back a lot of QSYMBOL and QSYMBOL_LIST which are converted to numpy.string_ which I seem to understand is just bytes.

This is real annoying as the rest of my code uses plain python 3 strings.

Would it be possible for the user to specify an encoding and convert them to string?
Maybe using the QReader mapping mechanism? Is it private or can be overwritten?

Fails to parse lists with embedded generic null (::)

Test case:

(42;::;`foo)

error while deserializing symbol list with null as a first element

qPython fails to deserialize symbol list with null as a first element.

Test cases:

``
``abc

Results in:

Traceback (most recent call last):
  File "D:\dev\workspace\qPython\samples\console.py", line 39, in <module>
    result = q(x)
  File "D:\dev\workspace\qPython\qpython\qconnection.py", line 174, in __call__
    return self.sync(parameters[0], *parameters[1:])
  File "D:\dev\workspace\qPython\qpython\qconnection.py", line 146, in sync
    response = self.receive(data_only = False)
  File "D:\dev\workspace\qPython\qpython\qconnection.py", line 169, in receive
    result = self._reader.read(raw)
  File "D:\dev\workspace\qPython\qpython\qreader.py", line 101, in read
    message.data = self.read_data(message.size, raw, message.is_compressed)
  File "D:\dev\workspace\qPython\qpython\qreader.py", line 151, in read_data
    return raw_data if raw else self._read_object()
  File "D:\dev\workspace\qPython\qpython\qreader.py", line 162, in _read_object
    return self._read_list(qtype)
  File "D:\dev\workspace\qPython\qpython\qreader.py", line 226, in _read_list
    symbols = self._buffer.get_symbols(length)
  File "D:\dev\workspace\qPython\qpython\qreader.py", line 387, in get_symbols
    raise QReaderException('Failed to read symbol from stream')
qpython.qreader.QReaderException: Failed to read symbol from stream

cannot read from timed out object

i'd like to report the bug i meet:
get the error of ' cannot read from timed out object'

`QFLOAT_LIST` in `MetaData` produces `real` KDB+ data type not `float`

Enforcing the data type as QFLOAT_LIST doesn't seem to work, as the KDB+ result is real (e), not float (f). Am I missing something?

from pandas import Series,DataFrame
import numpy as np
import qpython.qconnection as qconnection
from qpython import MetaData
from qpython.qtype import QFLOAT_LIST

d = {'col' : Series([1., 2.2, 3.45, 4.6564])}
tbl = DataFrame(d)
tbl[['col']].astype(np.float)
tbl.meta = MetaData(col = QFLOAT_LIST)
q = qconnection.QConnection(host = '10.10.4.220', port = 5000, pandas = True)
q.open()
q('set', numpy.string_('tbl'), tbl)
q('meta tbl')

#      t f a
# c         
# col  e

Thank you

QLambda is causing some kdb+ systems to crash

Code sample:

q.async(".gw.asyncfunc", QLambda("raze"))

q)type raze
107h

is not a lambda. The users code should be:

q.async(".gw.asyncfunc", QLambda("{raze x}"))

Solution:
add validation to QLambda to ensure that expression is enclosed in { and }.

Null float to KDB+ converts to empty string

I am currently using python 2.7.12 using qpython 1.2.2, numpy 1.10.4.

Running the following script does not send a NAN float datatype to KDB+ but instead an empty string.

from qpython.qtype import qnull,QDOUBLE,QFLOAT,_QNAN64
from qpython import qconnection
import numpy

q = qconnection.QConnection(host = 'localhost', port = 5010)
q.open()
q.sync('foo',numpy.string_('test_trade'),[numpy.string_('PYTHON'),qnull(QDOUBLE)])
or
q.sync('foo',numpy.string_('test_trade'),[numpy.string_('PYTHON'),_QNAN64])

KDB+ 3.3 2015.11.03 :
q)foo:{[x;y] show y}
q)IBM " " 2nd example: q.sync('foo',numpy.string_('test_trade'),[numpy.string_('IBM'),qnull(-7)]) q)IBM
0N

Seems like a bug when trying to send a null float datatype .

Pandas Series: Numeric indices that do not contain zero

Attempting to write a DataFrame/Series with an integer index that does not contain zero will throw an exception:

df = DataFrame({'k':[1,2,3],'v':['a','b','c']}).set_index('k')
q.sync('`test set',df)

File "C:\Anaconda3\envs\pyq\lib\site-packages\qpython\_pandas.py", line 157, in _write_pandas_series
    qtype = Q_TYPE.get(type(data[0]), QGENERAL_LIST)
KeyError: 0

This is caused by the data[0] lookup. With a non-numeric index data[0] will return the first row as expected, however when the index is numeric it will search for 0 and find nothing. Explicit indexing by row number (via iloc) fixes the issue.

_pandas.py:157
    qtype = Q_TYPE.get(type(data[0]), QGENERAL_LIST)
    qtype = Q_TYPE.get(type(data.iloc[0]), QGENERAL_LIST)

qPython 1.1 dev fails for synchronous function with parameter:

Running qPython 1.1 dev fails when running

print q.sync('{til x}', 10)

it generates the following error

print(q.sync('{til x}', 10))
File "C:\Python3403_64\lib\site-packages\qpython\qconnection.py", line 268, in sync
    response = self.receive(data_only = False, **options)
File "C:\Python3403_64\lib\site-packages\qpython\qconnection.py", line 341, in receive
    result = self._reader.read(**self._options.union_dict(**options))
File "C:\Python3403_64\lib\site-packages\qpython\qreader.py", line 151, in read
    message.data = self.read_data(message.size, message.is_compressed, **options)
File "C:\Python3403_64\lib\site-packages\qpython\qreader.py", line 228, in read_data
    return raw_data if options.raw else self._read_object(options)
File "C:\Python3403_64\lib\site-packages\qpython\qreader.py", line 237, in _read_object
    return reader(self, qtype, options)
File "C:\Python3403_64\lib\site-packages\qpython\qreader.py", line 248, in _read_error
    raise QException(self._read_symbol(options = options))
qpython.qtype.QException: b'type'

Environment: Windows 7 64 bit, Python 3.4, qpython 1.1 dev

This query works for Python 2.7, qpython 1.0

Kind regard

David Bieber

Query with 'async' and 'receive' block

Environment:
centos7 x64
kdb+ 3.5
python 3.5

Operations:

start q service with some port and create a table
python client[A] connect it and use 'async' upsert data into that table
python client[B] connect it and try to use 'async' query data

It happens that client[B] 'receive' not working(blocked). However, after I replace 'sync' with 'async' it works well.

Please push 1.2.2 to PyPI

We encountered the recently fixed null char issue.

QConnection misshandles connection failures

If the QConnection attributes are bad (i.e. the q server isn't up or the port number is wrong) it's handled incorrectly. Should only take an error catch in https://github.com/exxeleron/qPython/blob/master/qpython/qconnection.py#L115-L119 to fix the issue.

# kdb is running on port 5000
In [1]: from qpython import qconnection

In [2]: q = qconnection.QConnection('localhost', 5000)

In [3]: q.open()

In [4]: q.is_connected()
Out[4]: True

In [5]: q = qconnection.QConnection('localhost', 6000)

In [6]: q.open()
---------------------------------------------------------------------------
error                                     Traceback (most recent call last)
<ipython-input-6-486c86d6f873> in <module>()
----> 1 q.open()

/Users/wjm/repos/qPython/qpython/qconnection.pyc in open(self)
    106                 raise QConnectionException('Host cannot be None')
    107
--> 108             self._init_socket()
    109             self._initialize()
    110

/Users/wjm/repos/qPython/qpython/qconnection.pyc in _init_socket(self)
    116         '''Initialises the socket used for communicating with a q service,'''
    117         self._connection = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
--> 118         self._connection.connect((self.host, self.port))
    119         self._connection.settimeout(self.timeout)
    120

/Users/wjm/anaconda/envs/flask-kdb/lib/python2.7/socket.pyc in meth(name, self, *args)
    222
    223 def meth(name,self,*args):
--> 224     return getattr(self._sock,name)(*args)
    225
    226 for _m in _socketmethods:

error: [Errno 61] Connection refused

In [7]: q.is_connected()
Out[7]: True

QTemporal wrong time

I found time wrong in example console.py

In q:
q).z.z
2016.06.15T**# 22**:49:42.868
q).z.Z
2016.06.16T_# 10:49:47.925
q)

in qpython:
Q).z.z

2016-06-16T**# 06**:50:16.811+0800 [metadata(qtype=-15)]
Q).z.Z

2016-06-16T_# 18:50:21.444+0800 [metadata(qtype=-15)]

Nested lists

Hi,

Having an issue with nested lists - is there specific qtable metadata that needs adding to the query call to allow numpy to take in nested lists?

cheers,
David

qConnection("([] a:1000#`a; b:1000#enlist til 10)")

ValueError: array-shape mismatch in array 1

/usr/local/lib/python2.7/dist-packages/numpy-1.9.0-py2.7-linux-x86_64.egg/numpy/core/records.py(562)fromarrays()
561 if testshape != shape:
--> 562 raise ValueError("array-shape mismatch in array %d" % k)
563

numpy deprecation warnings

Is it possible to put a fix in the next release to move to use frombuffer when converting the raw q data in qreader.py?

lib\site-packages\qpython\qreader.py:299: DeprecationWarning: The binary mode of fromstring is deprecated, as it behaves surprisingly on unicode inputs. Use frombuffer instead
data = numpy.fromstring(raw, dtype = conversion)

Deploy new version

Looks like recently (26 Dec) @SKolodynski commited these long-awaited fixes for async, fromstring and some other minor but nasty stuff.
Is there any chance that there will soon be made a new release in pip with these fixes included?

insert data?

Quick question: how do I insert data and convert from something like a pandas dataframe?

I can do the following:

q('(myHeader:5#myBars)')
df = q('(myHeader)')
print df
print df.meta
print q('type', df)

but then if I want to insert the data just returned I run into problems. I am probably not doing this correctly.

q.query(qconnection.MessageType.SYNC, 'myHeader2:', df)

I've tried other ways to insert a pandas dataframe as well. No luck. Any help is greatly appreciated.

Jebadiah

Date column type

Came across the following using Python 2.7.10 |Anaconda 2.2.0 (64-bit) querying kdb 3.1 l64

r = query_kdb.query(REGION, '([]2#.z.d)')
20150728 15:31:37.612 INFO IPC version: 3. Is connected: True
20150728 15:31:37.612 INFO Querying KDB. port:[kdbtest], host:[1234], username:[user], timeout:[4.0].
20150728 15:31:37.612 INFO Query: ([]2#.z.d)
r
QTable([(5687,), (5687,)],
dtype=[('d', '<i4')])
r[0]
(5687,)
type(r[0])
<class 'numpy.core.records.record'>
r0

result = query_kdb.query(REGION, '.z.d')
20150728 15:24:26.910 INFO IPC version: 3. Is connected: True
20150728 15:24:26.910 INFO Querying KDB. port:[kdbtest], host:[1234], username:[user], timeout:[4.0].
20150728 15:24:26.910 INFO Query: .z.d
result
<qpython.qtemporal.QTemporal object at 0x7f67c929e3d0>

It seems the type remains an integer from date columns?

Queries For Large Tables Fail

I've noticed that queries for large data sets fail. My example is a 69 million row table with 5 columns.

The IPC for this table from q to q works fine. But qPython fails. The actual error seems to be an infinite hang and the call to q.sync() never returns.

I was able to create a fix, which I'll explain as I detail the 2 problems I found.

Problem 1: Message Size Overflow

I put in some debugging statements to qreader.py and found that in QMessage, the read_header() function has the following code:

message_size = self._buffer.get_int().

This call returns a negative number for my query, which I'm guessing means that the size read from the IPC message was read as signed and overflowed. I added a get_uint() function to BytesBuffer get an unsigned integer for the message size, which gets me a positive size as I would expect.

def get_uint(self):
    return self.get('I')

This solves problem 1.

Problem 2: Socket Read Length Too Big

After fixing problem 1, a failure happens in QReader.read_data()

The following line of code creates an OverflowError: signed integer is greater than maximum:

raw_data = self._read_bytes(message_size - 8)

However, I was able to resolve the problem by adding from StringIO import StringIO and changing _read_bytes() to do the following:

def _read_bytes(self, length):
    if not self._stream:
        raise QReaderException('There is no input data. QReader requires either stream or data chunk')

    if length == 0:
        return b''
    else:
        CHUNKSIZE = 2048
        remaining = length
        buff = StringIO()

        while remaining > 0:
            chunk = self._stream.read(min(remaining, CHUNKSIZE))

            if chunk:
                remaining = remaining - len(chunk)
                buff.write(chunk)
            else:
                break

        data = buff.getvalue()

    if len(data) == 0:
        raise QReaderException('Error while reading data')
    return data

This seems to stem from the fact that you cant ask a file-like object to read more than the signed integer number of bytes. To get around that I read in chunks and combine the chunks using a StringIO.

I did no work to try and optimize the CHUNKSIZE, I also do not know if StringIO and reading chunks is the best way to go about this,

A better option, and to work most like the original implementation, might be to say something like (pseudo-code):

if length <= MAX_READABLE_LENGTH:
    data = read_in_one_shot_as_before()
else:
    data = read_in_chunks_as_proposed()

I'm curious to know what you think, and if you would try to address this. With the somewhat recent increase of the IPC limit to 1TB, I'm guessing this will happen more and more.

Thanks,
Derek

Fails to create q dictionary with values represented as table

qPython cannot represent q dictionary where:

values are represented as a table
keys are represented by list of atoms

Sample:

`abc`def`gh!([] one: 1 2 3; two: 4 5 6)

Kona implementation

This is more a question than an issue. I would like to know if you have an implementation or patch for Kona (https://github.com/kevinlawler/kona).

Thanks!

IPC 3.4 Support/Upgrade

Hi,

Thank you for this fantastic tool. I'm just wondering if there is a plan to upgrade and support IPC 3.4? From what I understand, there is currently a data size limitation of 2GB in IPC version 3. I know they have increased the data size to 1TB in IPC version 3.4. There is an increasing need for data size to go beyond 2GB size limit.

Thank you very much in advance

Jeffrey

slow uncompression

The uncompress on larger results is very slow.

The test query result is the a 20,000 row table with bid/ask columns. From a q IDE this takes <200ms.
qPython receives the data from the server quickly (<100ms), but the uncompress itself takes 4-5 seconds.

Uncompression is handled in qreader.py, the _uncompress function called by read_data.

Same test in Java and the uncompress time is negligible.

Details:

kdb+ 3.1
python 2.7.6
numpy 1.8

Infinite Date Fails With Pandas version 0.20 and above

I also tried in the latest 0.21.0 which was just released and the same result:

In [8]: pd.__version__
Out[8]: u'0.21.0'

Demonstration of failure:

In [1]: import qpython.qconnection
In [2]: qc = qpython.qconnection.QConnection(host="****", port=32423)
In [3]: qc.open()
In [4]: qc.sync('([]date:2017.01.01 0N 0Wd)', pandas=True)
OutOfBoundsDatetime: Out of bounds nanosecond timestamp: -5877611-06-21 00:00:00

Demonstration of proper functionality when infinity is removed:

In [5]: qc.sync('([]date:2017.01.01 0Nd)', pandas=True)
Out[5]: 
        date
0 2017-01-01
1        NaT

According to the release notes (http://pandas.pydata.org/pandas-docs/version/0.20.3/whatsnew.html) pandas has added bounds checking to pd.Timestamp()

Demonstration of it working (in a manner, it still overflows to some date generally less than today which is probably undesirable, albeit not crashing) in 0.19:

In [1]: import pandas as pd

In [2]: pd.__version__
Out[2]: u'0.19.2'
In [3]: import qpython.qconnection
In [4]: qc = qpython.qconnection.QConnection(host="*****", port=32423)
In [5]: qc.open()
In [6]: qc.sync('([]date:2017.01.01 0N 0Wd)', pandas=True)
Out[6]: 
                           date
0 2017-01-01 00:00:00.000000000
1                           NaT
2 1834-02-05 07:42:50.670153728

This also applies to negative infinity. I presume that this also applies to the other temporal types as well.

For pandas, the dates will have to be bounded by the following range, or this exception will be thrown.

In [7]: pd.Timestamp.min
Out[7]: Timestamp('1677-09-21 00:12:43.145225')

In [8]: pd.Timestamp.max
Out[8]: Timestamp('2262-04-11 23:47:16.854775807')

Inbound synchronous calls into python

The documentation does not address how to send replies to inbound synchronous KDB calls. For example, if the python code registers with a KDB server and then receives synchronous requests to perform some function and send a reply back. Sending back a response via a standard async call doesn't seem to work.

Byte to String Conversion

When Using Panda reader, all strings are converted to byte strings. Overwriting the Qreader doesnt soilve it
I took the new method from here https://qpython.readthedocs.io/en/latest/usage-examples.html?highlight=StringQReader

exxeleron / qpython Goto Github PK

qpython's People

Contributors

Stargazers

Watchers

Forkers

qpython's Issues

Recommend Projects

Recommend Topics

Recommend Org