Giter Club home page Giter Club logo

static-frame's Issues

Series.from_concat()

Support creating a Series from multiple Series, assuming non-overlapping indices.

astype interface on TypeBlocks, Series, Frame, Index

Support assigning types for multiple columns at once on TypeBlocks, so as to support consolidating (or isolating) blocks in TypeBlocks.

Possibly support getitem style selections on TypeBlocks on Frame, i.e., in a manner similar to assign():

>>> frame.astype[['b', 'g,' ,'f']](float)
>>> frame.astype['g':](bool)

`Series.from_items` to make data with `IndexHierarchy` indices

I want to make a Series with an IndexHierarchy index using Series.from_items. (I like from_items esp. for hard-coded data because it puts the indices directly next to their respective values, which is much easier to keep track of when reading & debugging code.)

For example, I want to do this:

series = sf.Series.from_items([
    (('one', 1), 'hello'),
    (('two', 2), 'cya'),
])

What I get is this series:

<Series>
<Index>
('one', 1) hello
('two', 2) cya
<object>   <<U5>

But what I want is this series:

<Series>
<IndexHierarchy>
one              1        hello
two              2        cya
<object>         <object> <<U5>

So:

  1. Is there an existing method for doing this?
  2. If not, is there a method that would convert from the series I made into the series I want? (Something like the opposite of reindex_flat?) If there isn't, would it be worth implementing?
  3. If "no" to all of the above, could you change from_items to interpret tuples as entries for an IndexHierarchy, without breaking anything else?

TypeBlocks.roll

Permit rolling on one or both axis simultaneously. Implementation will be very similar to TypeBlocks._drop_blocks()

C Merge Checklist

On the c branch:

  • Update setup.py with name="static-frame" (it's currently "C-SF", for testing).
  • Update .travis.yml to deploy off of the master branch.
  • Update .travis.yml to use the "real" PyPI.
  • Update .travis.yml with @flexatone's PyPI credentials (see me for this).
  • Optionally, update .travis.yml to deploy to PyPI only on tags. I'd rather just deploy whenever the version changes in setup.py (which is what it does now), but we can chat about that.

On the master branch:

  • Make a new minor release.
  • Merge c.
  • Use dev releases to make the necessary changes to our conda-forge recipe. This will be the last tricky part, but I'm pretty sure I've figured it out!

PyArrow Integration

I listened to the TalkPython episode today that covered this library and I realized that this is the library I was looking for about a year ago. I have implemented parts of this library several times trying to address the pain points it tries to solve, so thank you!

Is the Apache Arrow project on your radar at all? It is very much aligned with the goals of this library and would provide several other features for free.

Series.drop()

Analog to Frame.drop(); implement with np.delete().

Single row, single column Frame has unsized values

In : f = sf.FrameGO(dict(color=('black',)))
In : f['color']
Traceback (most recent call last):
  File "/usr/lib/python3.5/code.py", line 91, in runcode
    exec(code, self.locals)
  File "<console>", line 1, in <module>
  File "/home/ariza/src/rapc/static_frame/static_frame/core/frame.py", line 1109, in __getitem__
    return self._extract(*self._compound_loc_to_getitem_iloc(key))
  File "/home/ariza/src/rapc/static_frame/static_frame/core/frame.py", line 1049, in _extract
    return Series(blocks.values[0], index=index)
  File "/home/ariza/src/rapc/static_frame/static_frame/core/series.py", line 160, in __init__
    if len(self.values) != shape:
TypeError: len() of unsized object

`Frame.set_index_hierarchy` fails for `Frame`s with only one row

For examples:

sf.Frame.from_records([
    ('one', 1, 'hello'),
    ('two', 2, 'cya'),
], columns=['name', 'val', 'msg']).set_index_hierarchy(['name', 'val'], drop=True)

(correctly afaik) produces this frame:

<Frame>
<Index>                       msg   <<U4>
<Index...
one                  1        hello
two                  2        cya
<object>             <object> <<U5>

But if I remove the second row, like so:

sf.Frame.from_records([
    ('one', 1, 'hello'),
], columns=['name', 'val', 'msg']).set_index_hierarchy(['name', 'val'], drop=True)

this fails with the following traceback:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.5/dist-packages/static_frame/core/frame.py", line 2083, in set_index_hierarchy
    index = IndexHierarchy.from_labels(index_labels, name=column_name)
  File "/usr/local/lib/python3.5/dist-packages/static_frame/core/index_hierarchy.py", line 586, in from_labels
    for d, v in enumerate(label):
TypeError: 'int' object is not iterable

One detail to note is that it seems to fail on the first non-iterable set_index_hierarchy column value; it just so happens that the first is 'one' which is iterable.

Is this in some way an intended behavior?

Series.to_frame_go() produces an invalid FrameGO when axis=0

a = sf.Series((1, 2, 3))
fg = a.to_frame_go(axis=0)
fg['b'] = 'b'

Traceback (most recent call last):
 File "/home/rutherford/src/rapc/static_frame/static_frame/core/frame.py", line 2561, in __setitem__
   self._columns.append(key)
builtins.AttributeError: 'Index' object has no attribute 'append'

TypeBlocks.drop()

Optimized implementation of drop implemented on TypeBlocks, permitting Frame.set_index() and Frame.set_index_hierarchy() better dropping.

Frame.drop()

Permit independent index, columns arguments, like Frame.reindex(); implement with TypeBlocks._drop_blocks().

Using `header_is_columns` on `sf.Frame.from_csv` causes exception

A minimal example:

from io import StringIO

import static_frame as sf

INPUT_STR = '''
196412	0.0
196501	0.0
196502	0.0
196503	0.0
196504	0.0
196505	0.0'''

if __name__ == '__main__':
    works_but_wrong_headers = sf.Frame.from_tsv(StringIO(INPUT_STR), index_column=0)
    desired_but_fails = sf.Frame.from_tsv(StringIO(INPUT_STR), index_column=0, header_is_columns=False)

The exception I observe:

Traceback (most recent call last):
  File "/home/<username>/sf_test_case.py", line 15, in <module>
    desired_but_fails = sf.Frame.from_tsv(StringIO(INPUT_STR), index_column=0, header_is_columns=False)
  File "/home/<username>/src/rapc/static_frame/static_frame/core/frame.py", line 713, in from_tsv
    return cls.from_csv(fp, delimiter='\t', **kwargs)
  File "/home/<username>/src/rapc/static_frame/static_frame/core/frame.py", line 696, in from_csv
    missing_values={''},
  File "/home/<username>/.env37/lib/python3.7/site-packages/numpy/lib/npyio.py", line 1771, in genfromtxt
    names = list(names)
TypeError: 'bool' object is not iterable

"groupby" aggregation on index hierarchy levels

I have a static_frame.Series, like so:

import static_frame as sf
import numpy as np
    
colors = ('red', 'green')
shapes = ('square', 'circle', 'triangle')
    
series = sf.Series(np.arange(6), index=sf.IndexHierarchy.from_product(shapes, colors))

I want to be able to perform some "groupby" function, like summing all like-color values together into a new Series that looks something like this:

<IndexHierarchy>       <Series>
red   6
green 9
<<U8> <int64>

With a pandas.Series, I could do this:

agg_series = series.groupby(level=1).sum()

However as far as I can tell, there's no way to do this with static_frame objects (without building the result manually). The closest functionality I can find is the group iterator functions, but those don't let you group by index hierarchy levels.

I understand that static_frame is not intended as a drop-in replacement for pandas, but this seems like a desirable feature for any data-crunching library.

Frame.extend() supports Series

This must follow the addition of a name attribute, as given only a Series we need a column name; for now, extend_items() can be used.

Static Typing

I nominate mypy, since it's another project I'm becoming more involved with. I believe this will prove useful as StaticFrame begins interfacing with C code internally. Ideally, we'd use it as a CI job too.

The following modules pass mypy --strict:

  • doc/doc_build.py
  • doc/source/conf.py
  • setup.py
  • static_frame/__init__.py
  • static_frame/__main__.py
  • static_frame/core/__init__.py
  • static_frame/core/array_go.py
  • static_frame/core/display.py
  • static_frame/core/display_color.py
  • static_frame/core/display_html_datatables.py
  • static_frame/core/doc_str.py
  • static_frame/core/frame.py
  • static_frame/core/hloc.py
  • static_frame/core/index.py
  • static_frame/core/index_base.py
  • static_frame/core/index_hierarchy.py
  • static_frame/core/index_level.py
  • static_frame/core/iter_node.py
  • static_frame/core/operator_delegate.py
  • static_frame/core/series.py
  • static_frame/core/type_blocks.py
  • static_frame/core/util.py
  • static_frame/performance/__init__.py
  • static_frame/performance/core.py
  • static_frame/performance/main.py
  • static_frame/performance/perf_test.py
  • static_frame/performance/pydata_2018.py
  • static_frame/test/__init__.py
  • static_frame/test/property/__init__.py
  • static_frame/test/property/strategies.py
  • static_frame/test/property/test_index.py
  • static_frame/test/property/test_strategies.py
  • static_frame/test/property/test_type_blocks.py
  • static_frame/test/property/test_util.py
  • static_frame/test/test_case.py
  • static_frame/test/unit/__init__.py
  • static_frame/test/unit/test_array_go.py
  • static_frame/test/unit/test_display.py
  • static_frame/test/unit/test_display_color.py
  • static_frame/test/unit/test_doc.py
  • static_frame/test/unit/test_frame.py
  • static_frame/test/unit/test_index.py
  • static_frame/test/unit/test_index_hierarchy.py
  • static_frame/test/unit/test_index_level.py
  • static_frame/test/unit/test_main.py
  • static_frame/test/unit/test_series.py
  • static_frame/test/unit/test_type_blocks.py
  • static_frame/test/unit/test_util.py

Frame.clip()

Analog to Series.clip(), but with boundary objects will need to be Frames, not Series.

IndexHierarchy.drop_level()

Implement dropping from root side, in addition from leaves.
Reverse notation and default to drop from root side.

Working with time-series data?

Hi, got my interest peaked by your interview with Micheal Kennedy at Talk Python to Me. The library looks very interesting, especially the focus on removing ambiguity.

Most of my pandas work involve handling time-series data. Specifically of sensor data where the time between isn’t at all constant. I do lots of grouping by various time intervals, and time-aware interpolations.

After an admittedly cursory glance I didn’t see how I would do that kind of work with static-frame. Am I missing something, or haven’t you had reason/opportunity to consider that kind of use case?

Frame, Series, Index pickle handling

While basic pickling appears to work, post-pickle-loaded NumPy arrays have flags.writeable set to True. Might need appropriate pickle handlers defined for copyreg moduie.

Frame.set_index_hierarchy()

Permit creating a hierarchical index from columns already found in a Frame. There is no analog for Series, and have specialized method is better than overloading the behavior of Frame.set_index().

Frame.to_csv doesn't preserve index label

Note that the index name is lost when saving to csv in the below example.

f = sf.Frame([1, 2, 3], columns=['a'], index=sf.Index(range(3), name='Important Name'))
f
<Frame>
<Index>                 a       <<U1>
<Index: Important Name>
0                       1
1                       2
2                       3
<int64>                 <int64>

f.to_csv('/tmp/tmp.txt')
sf.Frame.from_csv('/tmp/tmp.txt')
<Frame>
<Index> index   a       <<U5>
<Index>
0       0       1
1       1       2
2       2       3
<int64> <int64> <int64>

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.