dlab-projects / marketflow Goto Github PK
View Code? Open in Web Editor NEWBasic Python library for working with the TAQ (US Trade and Quote) dataset
Home Page: http://marketflow.readthedocs.org/
License: BSD 2-Clause "Simplified" License
Basic Python library for working with the TAQ (US Trade and Quote) dataset
Home Page: http://marketflow.readthedocs.org/
License: BSD 2-Clause "Simplified" License
After python setup.py install
, I'm attempting to import raw_taq from taq and getting this error:
Traceback (most recent call last):
File "taq/generate_test_data.py", line 6, in <module>
from pytaq import raw_taq
ImportError: No module named 'pytaq'
Majora:python-taq dillon$ python taq/generate_test_data.py /Users/dillon/transfer/EQY_US_ALL_BBO_20140206.zip test_data_public.zip 10
Traceback (most recent call last):
File "taq/generate_test_data.py", line 6, in <module>
from taq import raw_taq
ImportError: No module named 'taq'
This would include how the data is chunked (rows, symbols) as well as where it's going.
Essentially varchar strings in pytables:
http://www.pytables.org/cookbook/hints_for_sql_users.html#column-type-declarations
So, we shouldn't use it. This isn't a huge issue, and actually simplifies the logic of our code - no more special handling of a time column, it's just a float64 in and HDF5 file.
I identified this in the HDF5 docs after being referred there by h5py/h5py#360. See also this.
Not sure if this is related perhaps to dlab-projects/dlab-finance#60
It makes sense to do this for performance reasons. We also want to store separate tables in HDF5 / pytables.
Working with EQY_US_ALL_BBO_20150731.zip
results in BaseException("Can't map fields onto bytes_per_line")
Line 179 in taq/raw_taq uses invalid error class Error
:
/Users/dillon/Dropbox/dlab/python-taq/taq/raw_taq.py in check_present_fields(self)
177 return
178
--> 179 raise Error("Can't map fields onto bytes_per_line")
180
181
NameError: name 'Error' is not defined
taq2h5 EQY_US_ALL_BBO_20111101.zip
results in ValueError: no field of name Retail_Interest_Indicator_RPI
.
Why don't our tests catch this? Well - we don't have a test datafile for each epoch. @yangraymond I guess you won't have time to create such a thing before you go?
We should figure out if we can legally distribute those IDs.
In particular, once you're sure you got what you need from them, you can delete the "Dav" versions of your notebooks!
Probably PyPI, but also anaconda.org. @davclark can help with conda packaging.
We find ZXYZ.A
in EQY_US_ALL_BBO_20140213
. This works as a valid HDF5 identifier, so I've not changed it.
Ask Dav for details if unclear!
cc @jaysid95
Something like:
Feel free to do that in the repo... it'll help us move forward in having a test directory in the right place. Let's start with pytest.
No idea why. Builds report no errors.
First - see if you can use blaze / odo to convert to bcolz. If that's not easy, just use a structure similar to hdf5.py
@yangraymond leads us in a tour of his testing code.
This functionality is needed for writing test data: i.e. we need to read in TAQ data, anonymize it, and write it back to TAQ format.
The current work-around for this takes individual rows from TAQ2Chunks output, convert them to strings with numpy.to_string
, and append them to file with b'\n'
. This creates data that looks like TAQ data, but causes TAQ2Chunks to throw an error about mapping fields from the file:
---------------------------------------------------------------------------
BaseException Traceback (most recent call last)
<ipython-input-6-dfe8991edd77> in <module>()
----> 1 generator = raw_taq.TAQ2Chunks('test.zip', do_process_chunk=True, chunksize=1000)
/Users/dillon/Dropbox/dlab/python-taq/taq/raw_taq.py in __init__(self, taq_fname, chunksize, do_process_chunk, chunk_type)
213 if chunk_type == 'lines':
214 self.iter_ = self._convert_taq()
--> 215 next(self.iter_) #read first line and setup attributes
216 elif chunk_type == 'symbols':
217 self.iter_ = self._symbol_taq() #make symbol_taq top level iter
/Users/dillon/Dropbox/dlab/python-taq/taq/raw_taq.py in _convert_taq(self)
253 self.bytes_spec = \
254 BytesSpec(bytes_per_line,
--> 255 computed_fields=[('Time', np.float64)])
256 # We want this for making the PyTables
257 # description:
/Users/dillon/Dropbox/dlab/python-taq/taq/raw_taq.py in __init__(self, bytes_per_line, computed_fields)
105 '''
106 self.bytes_per_line = bytes_per_line
--> 107 self.check_present_fields()
108
109 # The "easy" dtypes are the "not datetime" dtypes
/Users/dillon/Dropbox/dlab/python-taq/taq/raw_taq.py in check_present_fields(self)
177 return
178
--> 179 raise BaseException("Can't map fields onto bytes_per_line")
180
181
BaseException: Can't map fields onto bytes_per_line
Once this is working, notebooks and things should be updated in dlab-finance.
File is at https://github.com/BIDS-collaborative/dlab-finance/blob/master/basic-taq/raw_taq.py
Once we have that, remove from the BIDS-collaborative repo.
E.g.
generator = raw_taq.TAQ2Chunks(fp, chunk_type='symbols', chunksize=100000)
next(generator)
yields:
array([],
dtype=[('Time', '<f8'), ('hour', 'i1'), ('minute', 'i1'), ('msec', '<u2'), ('Exchange', 'S1'), ('Symbol_root', 'S6'), ('Symbol_suffix', 'S10'), ('Bid_Price', '<f8'), ('Bid_Size', '<i4'), ('Ask_Price', '<f8'), ('Ask_Size', '<i4'), ('Quote_Condition', 'S1'), ('Market_Maker', 'S4'), ('Bid_Exchange', 'S1'), ('Ask_Exchange', 'S1'), ('Sequence_Number', '<i8'), ('National_BBO_Ind', 'S1'), ('NASDAQ_BBO_Ind', 'S1'), ('Quote_Cancel_Correction', 'S1'), ('Source_of_Quote', 'S1'), ('Retail_Interest_Indicator_RPI', 'S1'), ('Short_Sale_Restriction_Indicator', 'S1'), ('LULD_BBO_Indicator_CQS', 'S1'), ('LULD_BBO_Indicator_UTP', 'S1'), ('FINRA_ADF_MPID_Indicator', 'S1'), ('SIP_generated_Message_Identifier', 'S1'), ('National_BBO_LULD_Indicator', 'S1')])
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.