exxeleron / qpython Goto Github PK
View Code? Open in Web Editor NEWinterprocess communication between Python and kdb+
Home Page: http://www.devnet.de
License: Apache License 2.0
interprocess communication between Python and kdb+
Home Page: http://www.devnet.de
License: Apache License 2.0
Noticed that the metadata was getting dropped during a copy of the object.
q('qlst:1 2 3 4 5')
test = q('qlst') # QList([1, 2, 3, 4, 5])
print test.meta # metadata(adjust_dtype=False, qtype=-7)
test2 = numpy.copy(test)
print test2 # QList([1, 2, 3, 4, 5])
print test2.meta
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-287-74313d31bde4> in <module>()
2 print test2
3 print test.meta
----> 4 print test2.meta
AttributeError: 'QList' object has no attribute 'meta'
test2 = test.__copy__()
print test2 # QList([1, 2, 3, 4, 5])
print test2.meta
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-287-74313d31bde4> in <module>()
2 print test2
3 print test.meta
----> 4 print test2.meta
AttributeError: 'QList' object has no attribute 'meta'
Same thing happens with QTable's
Create tox configuration for convenient testing against multiple Python versions.
My q query can be executed correctly in a q session. But in qPython, sometimes I receive "Error while data decompression", while other times I can query correctly.
What is the meaning of this error?
Sample code:
qc = QConnection(host='localhost', port=5000).open()
# this does not work. tbl is not created as an empty table on the server
qc.query(MessageType.SYNC, "tbl: ([] col1: (); col2: ())")
# this one does work, and creates x on the server
qc.query(MessageType.SYNC, "x: 12345")
Hello,
Is it possible to use qPython
with asyncio
without resorting to things like run_in_executor
(i.e. ThreadPoolExecutor
)? Both synchronous sync
and asynchronous receive
calls are blocking, making it impossible to use with asyncio
(without running in a separate thread).
Ideally, I would like to do something like the following. Note that I've made up methods sync_async
(yeah, not a good name) and receive_async
:
async with qconnection.QConnection(...) as q:
result = await q.sync_async(query, **query_kwargs)
and/or
async with qconnection.QConnection(...) as q:
q.async(query)
result = await q.receive_async()
In fact, there'd be no need in two methods (?) - we just need a method returning a future which we could await
:
async with qconnection.QConnection(...) as q:
future = q.async(query)
result = await future
Another related problem I currently face is that all socket reads are blocking, which makes my program very slow (I want to read multiple large results concurrently). I'm not certain of technical details, but it seems like it should be possible to do non-blocking concurrent socket I/O (e.g. asyncore seems like it does the job?).
Thanks!
Steps for Reproduction:
Create a connection to a Q process from python, then close it:
c = QConnection(host='localhost',port=5000)
c.open()
c.close()
On the Q process, inspect .z.W, the socket will still be open.
Issue: The file handle associated with the Reader object has not been closed. As is outlined here: https://docs.python.org/3.5/library/socket.html#socket.socket.close The file handle associated with the socket is not released until all file objects from makefile are closed.
Remediation step:
c._reader._stream.close()
Suggested change:
Move the creation of the file object to QConnection._init_socket
and also ensure it is closed in QConnection.close
Version Used: Python 3.5.1 and QPython 1.2
I'm not clear about what exactly causes this but this is how to reproduce:
When executing a query via QConnection.sync(), if it gets interrupted (e.g. via ctrl+c), and if you then run another query, the QConnection handle will return the results for the previous query.
example:
qc = qconnection.QConnection(host, port=port, username=username)
qc.open()
res1 = qc('select * from trade where sid=1', pandas=True)
res2 = qc('select * from trade where sid=2', pandas=True) (interrupt with ctrl+c)
then run again:
res1 = qc('select * from trade where sid=1', pandas=True)
res2 = qc('select * from trade where sid=2', pandas=True)
res1 will have the results for res2 OR res2 will have results for res1, or both
this is true for qpython version '1.1.0b3' and pandas version '0.16.2' running on linux
if you do qc.close(), qc.open() before each call to sync(), the problem disappears.
It seems that np.datetime64 assumes UTC when KDB is showing a local timestamp. Any way to get around this?
In [67]: conn.sync("string .z.Z")
Out[67]: '2015.08.25T18:03:28.368'
In [69]: print conn.sync(".z.Z")
2015-08-25T14:03:36.137-0400 [metadata(qtype=-15)]
Is there a workaround for passing more than 8 parameters?
Hi, Masters:
I'm trying to do the test with qpython connects to TickerPlant. the script "publisher.py" demonstrates the function.
But while I running it. I got "Unable to get repr for <class 'main.PublisherThread'>" message. Do you know what's it caused and how to fix it?
Thanks
Zheng
from https://docs.python.org/3.8/whatsnew/3.7.html?highlight=reserved%20words:
Backwards incompatible syntax changes:
async and await are now reserved keywords.
This prevents us importing qpython as it complains that
def async(self, query, *parameters, **options):
is invalid syntax.
It would be nice if there was a setting which allowed to get data as str
, rather than bytes
, in Python 3 when used with pandas=True
. I'm aware of the discussion in #35 and related example of a custom reader here, so it's definitely possible to achieve this. It just seems like there are quite a few things you need to overwrite to handle all the cases, which makes it very easy to do it wrong. If this is not going to be officially supported out of the box, then at least a recipe with QReader
implementation which handles all the possible cases would be extremely helpful.
I feel like Python 3 users would prefer to get str
over bytes
in vast majority of cases, and using pandas DataFrame is essential in most scientific applications. If anything, at first I was naively expecting that passing encoding
to QConnection
would make it do exactly that. So if you don't pass any encoding it would give you bytes, but if you do pass the encoding it would give you decoded strings.
Thanks!
Hi,
Not an issue, but just curious if qpython can support tcps connection?
h:hopen :tcps://hostname:port[:username:password]
I would like to connect to a TLS enabled q server from a windows client that has open_ssl installed. If its not already supported is it possible to wrap the socket connection in a TLS wrapper?
Thanks
qPython raises an error while deserializing table containing char column.
Test case:
flip `name`iq`grade!(`Dent`Beeblebrox`Prefect;98 42 126;"a c")
Table meta:
c | t f a
-----| -----
name | s
iq | j
grade| c
Results in:
Traceback (most recent call last):
File "D:\dev\workspace\qPython\samples\console.py", line 39, in <module>
result = q(x)
File "D:\dev\workspace\qPython\qpython\qconnection.py", line 174, in __call__
return self.sync(parameters[0], *parameters[1:])
File "D:\dev\workspace\qPython\qpython\qconnection.py", line 146, in sync
response = self.receive(data_only = False)
File "D:\dev\workspace\qPython\qpython\qconnection.py", line 169, in receive
result = self._reader.read(raw)
File "D:\dev\workspace\qPython\qpython\qreader.py", line 101, in read
message.data = self.read_data(message.size, raw, message.is_compressed)
File "D:\dev\workspace\qPython\qpython\qreader.py", line 151, in read_data
return raw_data if raw else self._read_object()
File "D:\dev\workspace\qPython\qpython\qreader.py", line 160, in _read_object
return reader(self, qtype)
File "D:\dev\workspace\qPython\qpython\qreader.py", line 268, in _read_table
return qtable(columns, data, qtype = QTABLE)
File "D:\dev\workspace\qPython\qpython\qcollection.py", line 126, in qtable
table = numpy.core.records.fromarrays(data, names = ','.join(columns))
File "C:\Python27\lib\site-packages\numpy\core\records.py", line 562, in fromarrays
raise ValueError("array-shape mismatch in array %d" % k)
ValueError: array-shape mismatch in array 2
Hello,
I am trying to use qpython for the first time. One of my use cases is to run a query and dump data to csv file. Is there an easy way to do so? I can write individual rows, but how do I get column names.
Many thanks.
Hi there!
I'm currently trying to get the Twisted integration example you provided to work, but I', running into a few issues.
First of all: when I'm trying to connect to kdb the line self.transport.write(self.credentials + '\3\0')
throws an TypeError telling me the "Data must not be unicode". So I tried changing it to self.transport.write(str(self.credentials + '\0').encode("utf-8"))
(not the most sexy solution, I know) and the error disappeared... But as it usually goes, another showed up.
This time a TypeError with cause "not all arguments converted during string formatting" popped up, and I do not seem to be able to find where It is coming from. Is this about data that's coming from kdb that is not being converted, or outgoing data?
The kdb database I'm currently working with has no users so I did not provide a username and password --> factory = IPCClientFactory('', '', onConnectSuccess, onConnectFail, onMessage, onError)
.
Could that be in the way of succes? I'm a bit clueless..
Thanks in advance!
When trying to pass from Pandas
to q
, the time index in DataFrame (in the column Date
) doesn't quite make it to the q side:
import pandas.io.data as web
import datetime
import numpy
import qpython.qconnection as qconnection # requires installation of qPython module from https://github.com/exxeleron/qPython
start = datetime.datetime(2010, 1, 1)
end = datetime.datetime(2015, 2, 6)
f=web.DataReader("F", 'yahoo', start, end) # download Ford stock data (ticker "F") from Yahoo Finance web service
f.ix[:5] # explore first 5 rows of the DataFrame
# Out:
# Open High Low Close Volume Adj Close
# Date
#2010-01-04 10.17 10.28 10.05 10.28 60855800 9.43
#2010-01-05 10.45 11.24 10.40 10.96 215620200 10.05
#2010-01-06 11.21 11.46 11.13 11.37 200070600 10.43
#2010-01-07 11.46 11.69 11.32 11.66 130201700 10.69
#2010-01-08 11.67 11.74 11.46 11.69 130463000 10.72
q = qconnection.QConnection(host = 'localhost', port = 5000, pandas = True) # define connection interface parameters. Assumes we have previously started q server on port 5000 with `q.exe -p 5000` command
q.open() # open connection
q('set', numpy.string_('yahoo'), f) # pass DataFrame to q table named `yahoo`
q('5#yahoo') # display top 5 rows from newly created table on q server
# Out:
# Open High Low Close Volume Adj Close
#0 10.17 10.28 10.05 10.28 60855800 9.43
#1 10.45 11.24 10.40 10.96 215620200 10.05
#2 11.21 11.46 11.13 11.37 200070600 10.43
#3 11.46 11.69 11.32 11.66 130201700 10.69
#4 11.67 11.74 11.46 11.69 130463000 10.72
By all means, thank you for excellent qPython package!
On behalf of David Roberts as reported in https://groups.google.com/forum/#!topic/exxeleron/yDTruQw8wvw
qwriter does not flush buffers resulting in any sync or async query being able to stall.
Line 80 in qwriter needs to be sendall, not send or python does not dispatch the whole buffer and flush.
To reproduce, run on a linux server with default network buffers a few ms distant from the server - windows socket stack doesn't seem to exhibit the same behaviour:
def id_generator(size=6, chars=string.ascii_uppercase + string.digits):
return ''.join(random.choice(chars) for _ in range(size))
text=[]
for a in range(1000):
text.append(id_generator(100))
q = qconnection.QConnection()
q.open()
print q("1b")
print q.sync("{count x}",qlist(range(100000), qtype=qtype.QINT_LIST))
print q.sync("{count x}",qlist(text, qtype=qtype.QSYMBOL_LIST))
print q("1b")
Hi,
I once open an issue about Strings in qPyton that are returned ad bytes and not unicode strings in python 3.
#35
The issue was closed after some improvement to the QReader were implemented.
I think there is still room for improvement as even with the current code I struggle to have override the behaviour.
The problem is that I want to use PandasQReader.
Now, PandasQReader inherits from QReader, so I cant really inject my class in between, I would have to copy PandasQReader so it inherits from my class.
This is similar to the stream framework in Java. There each stream class introduces some new behaviour and the way to make them work together is by putting one inside the other, rather than make them inherit from a particular one.
So if PandasQReader took a QReader in the constructor I could pass at runtime my modified class as opposed to the default one.
Unless I have missed something.
When reading from KDB to Python 3 pandas dataframe, as document says:
symbol (KDB) => numpy.bytes_ (python)
string (KDB) => bytes (python)
However, I found symbol is also read as bytes with pandas=True
More importantly, when write a pandas dataframe to KDB, I tested the type conversion:
str (python) => symbol (KDB)
numpy.bytes_(python) => symbol (KDB)
bytes (python) => symbol (KDB)
If in a list, the bytes (python) is correctly converted to string (KDB).
But the bug shows for pandas dataframe.
Hi - not an issue, just a support request but I do not see details of support.
Let's say I want to call a remote function from python and pass it a dictionary. Inside the dict I have an entry 'starttime' which a pass a QTemporal for, all good.
Now I do not want to pass a concrete time, I want to pass .z.p i.e. have the server use the current time. Is that possible in qpython?
Thanks.
Not sure if this is even possible given the IPC protocol used by kdb, but is there a way to transparently cast from q's temporals into numpy datetimes? Preferably, a flag in QConnection.query
or a global config that can be set that would automatically cast a query's results which are temporal to numpy datetimes.
I've made a flask-kdb extension using qpython, which for the record was a dream to work with (qpython and flask). To test out it's design, I made an in browser REPL link.
In doing so, the biggest issue I've had is the rendering of tables with datetimes in them:
I'd be welcome to some comments on what to do to resolve this. My current thinking is, if it's a kdb thing, the utility to convert the column could be as simple as running a \t <whatever>
to get the metadata I need.
Feel free to close this issue if this is a purely kdb thing that you can't fix.
On Python 2.7: I'm trying to send non-ASCII (utf-8) data to a q process. It works fine when sending the data as a QSYMBOL (using str.encode('utf-8')), but always fails when trying to send the data as a QSTRING.
If I send the data as a unicode string I get:
QWriterException: Unable to serialize type: <type 'unicode'>
in line 119 of qwriter.py.
If I convert the unicode string to a str type (using str.encode('utf-8')) I get:
UnicodeDecodeError: 'ascii' codec can't decode byte [...]: ordinal not in range(128).
in line 167 of qwriter.
Setting encoding to 'utf-8' when initialising the connection doesn't help either.
Do I need to write a custom _write_string override to get this to work properly?
Hi,
I am using python 3 and when I query my employer kdb server I get back a lot of QSYMBOL and QSYMBOL_LIST which are converted to numpy.string_ which I seem to understand is just bytes.
This is real annoying as the rest of my code uses plain python 3 strings.
Would it be possible for the user to specify an encoding and convert them to string?
Maybe using the QReader mapping mechanism? Is it private or can be overwritten?
Test case:
(42;::;`foo)
qPython fails to deserialize symbol list with null as a first element.
Test cases:
``
``abc
Results in:
Traceback (most recent call last):
File "D:\dev\workspace\qPython\samples\console.py", line 39, in <module>
result = q(x)
File "D:\dev\workspace\qPython\qpython\qconnection.py", line 174, in __call__
return self.sync(parameters[0], *parameters[1:])
File "D:\dev\workspace\qPython\qpython\qconnection.py", line 146, in sync
response = self.receive(data_only = False)
File "D:\dev\workspace\qPython\qpython\qconnection.py", line 169, in receive
result = self._reader.read(raw)
File "D:\dev\workspace\qPython\qpython\qreader.py", line 101, in read
message.data = self.read_data(message.size, raw, message.is_compressed)
File "D:\dev\workspace\qPython\qpython\qreader.py", line 151, in read_data
return raw_data if raw else self._read_object()
File "D:\dev\workspace\qPython\qpython\qreader.py", line 162, in _read_object
return self._read_list(qtype)
File "D:\dev\workspace\qPython\qpython\qreader.py", line 226, in _read_list
symbols = self._buffer.get_symbols(length)
File "D:\dev\workspace\qPython\qpython\qreader.py", line 387, in get_symbols
raise QReaderException('Failed to read symbol from stream')
qpython.qreader.QReaderException: Failed to read symbol from stream
Enforcing the data type as QFLOAT_LIST
doesn't seem to work, as the KDB+ result is real (e
), not float (f
). Am I missing something?
from pandas import Series,DataFrame
import numpy as np
import qpython.qconnection as qconnection
from qpython import MetaData
from qpython.qtype import QFLOAT_LIST
d = {'col' : Series([1., 2.2, 3.45, 4.6564])}
tbl = DataFrame(d)
tbl[['col']].astype(np.float)
tbl.meta = MetaData(col = QFLOAT_LIST)
q = qconnection.QConnection(host = '10.10.4.220', port = 5000, pandas = True)
q.open()
q('set', numpy.string_('tbl'), tbl)
q('meta tbl')
# t f a
# c
# col e
Thank you
Code sample:
q.async(".gw.asyncfunc", QLambda("raze"))
q)type raze
107h
is not a lambda. The users code should be:
q.async(".gw.asyncfunc", QLambda("{raze x}"))
Solution:
add validation to QLambda to ensure that expression is enclosed in {
and }
.
I am currently using python 2.7.12 using qpython 1.2.2, numpy 1.10.4.
Running the following script does not send a NAN float datatype to KDB+ but instead an empty string.
from qpython.qtype import qnull,QDOUBLE,QFLOAT,_QNAN64
from qpython import qconnection
import numpy
q = qconnection.QConnection(host = 'localhost', port = 5010)
q.open()
q.sync('foo',numpy.string_('test_trade'),[numpy.string_('PYTHON'),qnull(QDOUBLE)])
or
q.sync('foo',numpy.string_('test_trade'),[numpy.string_('PYTHON'),_QNAN64])
KDB+ 3.3 2015.11.03 :
q)foo:{[x;y] show y}
q)IBM " " 2nd example: q.sync('foo',numpy.string_('test_trade'),[numpy.string_('IBM'),qnull(-7)]) q)
IBM
0N
Seems like a bug when trying to send a null float datatype .
Attempting to write a DataFrame/Series with an integer index that does not contain zero will throw an exception:
df = DataFrame({'k':[1,2,3],'v':['a','b','c']}).set_index('k')
q.sync('`test set',df)
File "C:\Anaconda3\envs\pyq\lib\site-packages\qpython\_pandas.py", line 157, in _write_pandas_series
qtype = Q_TYPE.get(type(data[0]), QGENERAL_LIST)
KeyError: 0
This is caused by the data[0] lookup. With a non-numeric index data[0] will return the first row as expected, however when the index is numeric it will search for 0 and find nothing. Explicit indexing by row number (via iloc) fixes the issue.
_pandas.py:157
qtype = Q_TYPE.get(type(data[0]), QGENERAL_LIST)
qtype = Q_TYPE.get(type(data.iloc[0]), QGENERAL_LIST)
Running qPython 1.1 dev fails when running
print q.sync('{til x}', 10)
it generates the following error
print(q.sync('{til x}', 10))
File "C:\Python3403_64\lib\site-packages\qpython\qconnection.py", line 268, in sync
response = self.receive(data_only = False, **options)
File "C:\Python3403_64\lib\site-packages\qpython\qconnection.py", line 341, in receive
result = self._reader.read(**self._options.union_dict(**options))
File "C:\Python3403_64\lib\site-packages\qpython\qreader.py", line 151, in read
message.data = self.read_data(message.size, message.is_compressed, **options)
File "C:\Python3403_64\lib\site-packages\qpython\qreader.py", line 228, in read_data
return raw_data if options.raw else self._read_object(options)
File "C:\Python3403_64\lib\site-packages\qpython\qreader.py", line 237, in _read_object
return reader(self, qtype, options)
File "C:\Python3403_64\lib\site-packages\qpython\qreader.py", line 248, in _read_error
raise QException(self._read_symbol(options = options))
qpython.qtype.QException: b'type'
Environment: Windows 7 64 bit, Python 3.4, qpython 1.1 dev
This query works for Python 2.7, qpython 1.0
Kind regard
David Bieber
Environment:
centos7 x64
kdb+ 3.5
python 3.5
Operations:
It happens that client[B] 'receive' not working(blocked). However, after I replace 'sync' with 'async' it works well.
We encountered the recently fixed null char issue.
If the QConnection attributes are bad (i.e. the q server isn't up or the port number is wrong) it's handled incorrectly. Should only take an error catch in https://github.com/exxeleron/qPython/blob/master/qpython/qconnection.py#L115-L119 to fix the issue.
# kdb is running on port 5000
In [1]: from qpython import qconnection
In [2]: q = qconnection.QConnection('localhost', 5000)
In [3]: q.open()
In [4]: q.is_connected()
Out[4]: True
In [5]: q = qconnection.QConnection('localhost', 6000)
In [6]: q.open()
---------------------------------------------------------------------------
error Traceback (most recent call last)
<ipython-input-6-486c86d6f873> in <module>()
----> 1 q.open()
/Users/wjm/repos/qPython/qpython/qconnection.pyc in open(self)
106 raise QConnectionException('Host cannot be None')
107
--> 108 self._init_socket()
109 self._initialize()
110
/Users/wjm/repos/qPython/qpython/qconnection.pyc in _init_socket(self)
116 '''Initialises the socket used for communicating with a q service,'''
117 self._connection = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
--> 118 self._connection.connect((self.host, self.port))
119 self._connection.settimeout(self.timeout)
120
/Users/wjm/anaconda/envs/flask-kdb/lib/python2.7/socket.pyc in meth(name, self, *args)
222
223 def meth(name,self,*args):
--> 224 return getattr(self._sock,name)(*args)
225
226 for _m in _socketmethods:
error: [Errno 61] Connection refused
In [7]: q.is_connected()
Out[7]: True
I found time wrong in example console.py
In q:
q).z.z
2016.06.15T**# 22**:49:42.868
q).z.Z
2016.06.16T_# 10:49:47.925
q)
in qpython:
Q).z.z
2016-06-16T**# 06**:50:16.811+0800 [metadata(qtype=-15)]
Q).z.Z
2016-06-16T_# 18:50:21.444+0800 [metadata(qtype=-15)]
Hi,
Having an issue with nested lists - is there specific qtable metadata that needs adding to the query call to allow numpy to take in nested lists?
cheers,
David
qConnection("([] a:1000#`a; b:1000#enlist til 10)")
ValueError: array-shape mismatch in array 1
/usr/local/lib/python2.7/dist-packages/numpy-1.9.0-py2.7-linux-x86_64.egg/numpy/core/records.py(562)fromarrays()
561 if testshape != shape:
--> 562 raise ValueError("array-shape mismatch in array %d" % k)
563
Is it possible to put a fix in the next release to move to use frombuffer when converting the raw q data in qreader.py?
lib\site-packages\qpython\qreader.py:299: DeprecationWarning: The binary mode of fromstring is deprecated, as it behaves surprisingly on unicode inputs. Use frombuffer instead
data = numpy.fromstring(raw, dtype = conversion)
Looks like recently (26 Dec) @SKolodynski commited these long-awaited fixes for async
, fromstring
and some other minor but nasty stuff.
Is there any chance that there will soon be made a new release in pip
with these fixes included?
Quick question: how do I insert data and convert from something like a pandas dataframe?
I can do the following:
q('(myHeader:5#myBars)')
df = q('(myHeader)')
print df
print df.meta
print q('type', df)
but then if I want to insert the data just returned I run into problems. I am probably not doing this correctly.
q.query(qconnection.MessageType.SYNC, 'myHeader2:', df)
I've tried other ways to insert a pandas dataframe as well. No luck. Any help is greatly appreciated.
Jebadiah
Came across the following using Python 2.7.10 |Anaconda 2.2.0 (64-bit) querying kdb 3.1 l64
r = query_kdb.query(REGION, '([]2#.z.d)')
20150728 15:31:37.612 INFO IPC version: 3. Is connected: True
20150728 15:31:37.612 INFO Querying KDB. port:[kdbtest], host:[1234], username:[user], timeout:[4.0].
20150728 15:31:37.612 INFO Query: ([]2#.z.d)
r
QTable([(5687,), (5687,)],
dtype=[('d', '<i4')])
r[0]
(5687,)
type(r[0])
<class 'numpy.core.records.record'>
r0result = query_kdb.query(REGION, '.z.d')
20150728 15:24:26.910 INFO IPC version: 3. Is connected: True
20150728 15:24:26.910 INFO Querying KDB. port:[kdbtest], host:[1234], username:[user], timeout:[4.0].
20150728 15:24:26.910 INFO Query: .z.d
result
<qpython.qtemporal.QTemporal object at 0x7f67c929e3d0>
It seems the type remains an integer from date columns?
I've noticed that queries for large data sets fail. My example is a 69 million row table with 5 columns.
The IPC for this table from q to q works fine. But qPython fails. The actual error seems to be an infinite hang and the call to q.sync() never returns.
I was able to create a fix, which I'll explain as I detail the 2 problems I found.
Problem 1: Message Size Overflow
I put in some debugging statements to qreader.py
and found that in QMessage
, the read_header() function has the following code:
message_size = self._buffer.get_int().
This call returns a negative number for my query, which I'm guessing means that the size read from the IPC message was read as signed and overflowed. I added a get_uint()
function to BytesBuffer
get an unsigned integer for the message size, which gets me a positive size as I would expect.
def get_uint(self):
return self.get('I')
This solves problem 1.
Problem 2: Socket Read Length Too Big
After fixing problem 1, a failure happens in QReader.read_data()
The following line of code creates an OverflowError: signed integer is greater than maximum
:
raw_data = self._read_bytes(message_size - 8)
However, I was able to resolve the problem by adding from StringIO import StringIO
and changing _read_bytes()
to do the following:
def _read_bytes(self, length):
if not self._stream:
raise QReaderException('There is no input data. QReader requires either stream or data chunk')
if length == 0:
return b''
else:
CHUNKSIZE = 2048
remaining = length
buff = StringIO()
while remaining > 0:
chunk = self._stream.read(min(remaining, CHUNKSIZE))
if chunk:
remaining = remaining - len(chunk)
buff.write(chunk)
else:
break
data = buff.getvalue()
if len(data) == 0:
raise QReaderException('Error while reading data')
return data
This seems to stem from the fact that you cant ask a file-like object to read more than the signed integer number of bytes. To get around that I read in chunks and combine the chunks using a StringIO.
I did no work to try and optimize the CHUNKSIZE, I also do not know if StringIO and reading chunks is the best way to go about this,
A better option, and to work most like the original implementation, might be to say something like (pseudo-code):
if length <= MAX_READABLE_LENGTH:
data = read_in_one_shot_as_before()
else:
data = read_in_chunks_as_proposed()
I'm curious to know what you think, and if you would try to address this. With the somewhat recent increase of the IPC limit to 1TB, I'm guessing this will happen more and more.
Thanks,
Derek
qPython
cannot represent q dictionary where:
Sample:
`abc`def`gh!([] one: 1 2 3; two: 4 5 6)
This is more a question than an issue. I would like to know if you have an implementation or patch for Kona (https://github.com/kevinlawler/kona).
Thanks!
Hi,
Thank you for this fantastic tool. I'm just wondering if there is a plan to upgrade and support IPC 3.4? From what I understand, there is currently a data size limitation of 2GB in IPC version 3. I know they have increased the data size to 1TB in IPC version 3.4. There is an increasing need for data size to go beyond 2GB size limit.
Thank you very much in advance
Jeffrey
The uncompress on larger results is very slow.
The test query result is the a 20,000 row table with bid/ask columns. From a q IDE this takes <200ms.
qPython receives the data from the server quickly (<100ms), but the uncompress itself takes 4-5 seconds.
Uncompression is handled in qreader.py, the _uncompress function called by read_data.
Same test in Java and the uncompress time is negligible.
Details:
I also tried in the latest 0.21.0 which was just released and the same result:
In [8]: pd.__version__
Out[8]: u'0.21.0'
Demonstration of failure:
In [1]: import qpython.qconnection
In [2]: qc = qpython.qconnection.QConnection(host="****", port=32423)
In [3]: qc.open()
In [4]: qc.sync('([]date:2017.01.01 0N 0Wd)', pandas=True)
OutOfBoundsDatetime: Out of bounds nanosecond timestamp: -5877611-06-21 00:00:00
Demonstration of proper functionality when infinity is removed:
In [5]: qc.sync('([]date:2017.01.01 0Nd)', pandas=True)
Out[5]:
date
0 2017-01-01
1 NaT
According to the release notes (http://pandas.pydata.org/pandas-docs/version/0.20.3/whatsnew.html) pandas has added bounds checking to pd.Timestamp()
Demonstration of it working (in a manner, it still overflows to some date generally less than today which is probably undesirable, albeit not crashing) in 0.19:
In [1]: import pandas as pd
In [2]: pd.__version__
Out[2]: u'0.19.2'
In [3]: import qpython.qconnection
In [4]: qc = qpython.qconnection.QConnection(host="*****", port=32423)
In [5]: qc.open()
In [6]: qc.sync('([]date:2017.01.01 0N 0Wd)', pandas=True)
Out[6]:
date
0 2017-01-01 00:00:00.000000000
1 NaT
2 1834-02-05 07:42:50.670153728
This also applies to negative infinity. I presume that this also applies to the other temporal types as well.
For pandas, the dates will have to be bounded by the following range, or this exception will be thrown.
In [7]: pd.Timestamp.min
Out[7]: Timestamp('1677-09-21 00:12:43.145225')
In [8]: pd.Timestamp.max
Out[8]: Timestamp('2262-04-11 23:47:16.854775807')
The documentation does not address how to send replies to inbound synchronous KDB calls. For example, if the python code registers with a KDB server and then receives synchronous requests to perform some function and send a reply back. Sending back a response via a standard async call doesn't seem to work.
When Using Panda reader, all strings are converted to byte strings. Overwriting the Qreader doesnt soilve it
I took the new method from here https://qpython.readthedocs.io/en/latest/usage-examples.html?highlight=StringQReader
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.