Giter Club home page Giter Club logo

marketflow's People

Contributors

davclark avatar deniederhut avatar juanshishido avatar rayraycano avatar rdhyee avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

marketflow's Issues

Setup.py not allowing module import

After python setup.py install, I'm attempting to import raw_taq from taq and getting this error:

Traceback (most recent call last):
  File "taq/generate_test_data.py", line 6, in <module>
    from pytaq import raw_taq
ImportError: No module named 'pytaq'
Majora:python-taq dillon$ python taq/generate_test_data.py /Users/dillon/transfer/EQY_US_ALL_BBO_20140206.zip test_data_public.zip 10
Traceback (most recent call last):
  File "taq/generate_test_data.py", line 6, in <module>
    from taq import raw_taq
ImportError: No module named 'taq'

Consider approaches to modifying TAQ2Chunks behavior

This would include how the data is chunked (rows, symbols) as well as where it's going.

  • Odo can convert numpy chunks to whatever. Blaze is a related technology that is good at slicing and dicing data.
  • Martin Faasen's (author of lxml bindings for Python) wrote a generics library called Reg inspired by Zope Interfaces. Probably overkill, but worth thinking about.
  • Subclassing.

H5_TIME is unsupported...

So, we shouldn't use it. This isn't a huge issue, and actually simplifies the logic of our code - no more special handling of a time column, it's just a float64 in and HDF5 file.

I identified this in the HDF5 docs after being referred there by h5py/h5py#360. See also this.

Not sure if this is related perhaps to dlab-projects/dlab-finance#60

Can't read new TAQ files

Working with EQY_US_ALL_BBO_20150731.zip results in BaseException("Can't map fields onto bytes_per_line")

raw_taq error call needs Error type

Line 179 in taq/raw_taq uses invalid error class Error :

/Users/dillon/Dropbox/dlab/python-taq/taq/raw_taq.py in check_present_fields(self)
    177                 return
    178 
--> 179         raise Error("Can't map fields onto bytes_per_line")
    180 
    181 

NameError: name 'Error' is not defined

Can't read old TAQ files

taq2h5 EQY_US_ALL_BBO_20111101.zip results in ValueError: no field of name Retail_Interest_Indicator_RPI.

Why don't our tests catch this? Well - we don't have a test datafile for each epoch. @yangraymond I guess you won't have time to create such a thing before you go?

Strange symbol in 2014 data

We find ZXYZ.A in EQY_US_ALL_BBO_20140213. This works as a valid HDF5 identifier, so I've not changed it.

Create a basic test to understand pytest

Something like:

  1. Create f(x) = x + 2
  2. Test that f(x) = x + 3
  3. See that test fails
  4. Fix the failing test

Feel free to do that in the repo... it'll help us move forward in having a test directory in the right place. Let's start with pytest.

Bcolz conversion pipeline

First - see if you can use blaze / odo to convert to bcolz. If that's not easy, just use a structure similar to hdf5.py

raw_taq does not convert chunks back into TAQ format

This functionality is needed for writing test data: i.e. we need to read in TAQ data, anonymize it, and write it back to TAQ format.

The current work-around for this takes individual rows from TAQ2Chunks output, convert them to strings with numpy.to_string, and append them to file with b'\n'. This creates data that looks like TAQ data, but causes TAQ2Chunks to throw an error about mapping fields from the file:

---------------------------------------------------------------------------
BaseException                             Traceback (most recent call last)
<ipython-input-6-dfe8991edd77> in <module>()
----> 1 generator = raw_taq.TAQ2Chunks('test.zip', do_process_chunk=True, chunksize=1000)

/Users/dillon/Dropbox/dlab/python-taq/taq/raw_taq.py in __init__(self, taq_fname, chunksize, do_process_chunk, chunk_type)
    213         if chunk_type == 'lines':
    214             self.iter_ = self._convert_taq()
--> 215             next(self.iter_) #read first line and setup attributes
    216         elif chunk_type == 'symbols':
    217             self.iter_ = self._symbol_taq() #make symbol_taq top level iter

/Users/dillon/Dropbox/dlab/python-taq/taq/raw_taq.py in _convert_taq(self)
    253                         self.bytes_spec = \
    254                             BytesSpec(bytes_per_line,
--> 255                                       computed_fields=[('Time', np.float64)])
    256                                       # We want this for making the PyTables
    257                                       # description:

/Users/dillon/Dropbox/dlab/python-taq/taq/raw_taq.py in __init__(self, bytes_per_line, computed_fields)
    105         '''
    106         self.bytes_per_line = bytes_per_line
--> 107         self.check_present_fields()
    108 
    109         # The "easy" dtypes are the "not datetime" dtypes

/Users/dillon/Dropbox/dlab/python-taq/taq/raw_taq.py in check_present_fields(self)
    177                 return
    178 
--> 179         raise BaseException("Can't map fields onto bytes_per_line")
    180 
    181 

BaseException: Can't map fields onto bytes_per_line

TAQ2Chunks with chunk_type='symbols' yields empty numpy arrays

E.g.

generator = raw_taq.TAQ2Chunks(fp, chunk_type='symbols', chunksize=100000)
next(generator)

yields:

array([], 
      dtype=[('Time', '<f8'), ('hour', 'i1'), ('minute', 'i1'), ('msec', '<u2'), ('Exchange', 'S1'), ('Symbol_root', 'S6'), ('Symbol_suffix', 'S10'), ('Bid_Price', '<f8'), ('Bid_Size', '<i4'), ('Ask_Price', '<f8'), ('Ask_Size', '<i4'), ('Quote_Condition', 'S1'), ('Market_Maker', 'S4'), ('Bid_Exchange', 'S1'), ('Ask_Exchange', 'S1'), ('Sequence_Number', '<i8'), ('National_BBO_Ind', 'S1'), ('NASDAQ_BBO_Ind', 'S1'), ('Quote_Cancel_Correction', 'S1'), ('Source_of_Quote', 'S1'), ('Retail_Interest_Indicator_RPI', 'S1'), ('Short_Sale_Restriction_Indicator', 'S1'), ('LULD_BBO_Indicator_CQS', 'S1'), ('LULD_BBO_Indicator_UTP', 'S1'), ('FINRA_ADF_MPID_Indicator', 'S1'), ('SIP_generated_Message_Identifier', 'S1'), ('National_BBO_LULD_Indicator', 'S1')])

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.