The static-frame's discuss from static-frame

Series.from_concat()

Support creating a Series from multiple Series, assuming non-overlapping indices.

astype interface on TypeBlocks, Series, Frame, Index

Support assigning types for multiple columns at once on TypeBlocks, so as to support consolidating (or isolating) blocks in TypeBlocks.

Possibly support getitem style selections on TypeBlocks on Frame, i.e., in a manner similar to assign():

>>> frame.astype[['b', 'g,' ,'f']](float)
>>> frame.astype['g':](bool)

Frame.to_xlsx, Frame.from_xlsx

Index.keys() returns the dict_keys() for the underling map dictionary

By returning the keys of the Index, we can get a set-like object from the Index without creating a new container, useful in places where we, in a loop, OR multiple indices.

`Series.from_items` to make data with `IndexHierarchy` indices

I want to make a Series with an IndexHierarchy index using Series.from_items. (I like from_items esp. for hard-coded data because it puts the indices directly next to their respective values, which is much easier to keep track of when reading & debugging code.)

For example, I want to do this:

series = sf.Series.from_items([
    (('one', 1), 'hello'),
    (('two', 2), 'cya'),
])

What I get is this series:

<Series>
<Index>
('one', 1) hello
('two', 2) cya
<object>   <<U5>

But what I want is this series:

<Series>
<IndexHierarchy>
one              1        hello
two              2        cya
<object>         <object> <<U5>

So:

Is there an existing method for doing this?
If not, is there a method that would convert from the series I made into the series I want? (Something like the opposite of reindex_flat?) If there isn't, would it be worth implementing?
If "no" to all of the above, could you change from_items to interpret tuples as entries for an IndexHierarchy, without breaking anything else?

TypeBlocks.roll

Permit rolling on one or both axis simultaneously. Implementation will be very similar to TypeBlocks._drop_blocks()

Implement max()/min() functionality on object dtypes

Display: permit independent width definition for index

Series.to_frame, Series.to_frame_go

Series.isin(): performance

C Merge Checklist

On the `c` branch:

Update setup.py with name="static-frame" (it's currently "C-SF", for testing).
Update .travis.yml to deploy off of the master branch.
Update .travis.yml to use the "real" PyPI.
Update .travis.yml with @flexatone's PyPI credentials (see me for this).
Optionally, update .travis.yml to deploy to PyPI only on tags. I'd rather just deploy whenever the version changes in setup.py (which is what it does now), but we can chat about that.

On the `master` branch:

Make a new minor release.
Merge c.
Use dev releases to make the necessary changes to our conda-forge recipe. This will be the last tricky part, but I'm pretty sure I've figured it out!

to_html() on Index, Series, Frame

PyArrow Integration

I listened to the TalkPython episode today that covered this library and I realized that this is the library I was looking for about a year ago. I have implemented parts of this library several times trying to address the pain points it tries to solve, so thank you!

Is the Apache Arrow project on your radar at all? It is very much aligned with the goals of this library and would provide several other features for free.

Add .name attribute to Series, Index, and Frame

IndexHierarchy.relabel_at_depth

Permit modifying an IndexHierarchy at a specified, integer depth.

Series.drop()

Analog to Frame.drop(); implement with np.delete().

Make consolidate_blocks an option for all relevant Frame constructors

Frame.to_xarray()

Explore options for exporting to xarray.

Series.head(); Series.tail()

Single row, single column Frame has unsized values

In : f = sf.FrameGO(dict(color=('black',)))
In : f['color']
Traceback (most recent call last):
  File "/usr/lib/python3.5/code.py", line 91, in runcode
    exec(code, self.locals)
  File "<console>", line 1, in <module>
  File "/home/ariza/src/rapc/static_frame/static_frame/core/frame.py", line 1109, in __getitem__
    return self._extract(*self._compound_loc_to_getitem_iloc(key))
  File "/home/ariza/src/rapc/static_frame/static_frame/core/frame.py", line 1049, in _extract
    return Series(blocks.values[0], index=index)
  File "/home/ariza/src/rapc/static_frame/static_frame/core/series.py", line 160, in __init__
    if len(self.values) != shape:
TypeError: len() of unsized object

`Frame.set_index_hierarchy` fails for `Frame`s with only one row

For examples:

sf.Frame.from_records([
    ('one', 1, 'hello'),
    ('two', 2, 'cya'),
], columns=['name', 'val', 'msg']).set_index_hierarchy(['name', 'val'], drop=True)

(correctly afaik) produces this frame:

<Frame>
<Index>                       msg   <<U4>
<Index...
one                  1        hello
two                  2        cya
<object>             <object> <<U5>

But if I remove the second row, like so:

sf.Frame.from_records([
    ('one', 1, 'hello'),
], columns=['name', 'val', 'msg']).set_index_hierarchy(['name', 'val'], drop=True)

this fails with the following traceback:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.5/dist-packages/static_frame/core/frame.py", line 2083, in set_index_hierarchy
    index = IndexHierarchy.from_labels(index_labels, name=column_name)
  File "/usr/local/lib/python3.5/dist-packages/static_frame/core/index_hierarchy.py", line 586, in from_labels
    for d, v in enumerate(label):
TypeError: 'int' object is not iterable

One detail to note is that it seems to fail on the first non-iterable set_index_hierarchy column value; it just so happens that the first is 'one' which is iterable.

Is this in some way an intended behavior?

Series.to_frame_go() produces an invalid FrameGO when axis=0

a = sf.Series((1, 2, 3))
fg = a.to_frame_go(axis=0)
fg['b'] = 'b'

Traceback (most recent call last):
 File "/home/rutherford/src/rapc/static_frame/static_frame/core/frame.py", line 2561, in __setitem__
   self._columns.append(key)
builtins.AttributeError: 'Index' object has no attribute 'append'

Frame.pivot()

Support pivot style transformation of Frame.

sort_values(), sort_index(), take a `key` argument?

Frame.to_frame_go does not properly handle IndexHierarchy

Frame.from_csv(): colons are removed if found in column names

Frame.fillna_leading(), Frame.fillna_trailing(), Frame.fillna_backward(), Frame.fillna_forward()

relabel_with_suffix, relabel_with_prefix for Frame, Series

Similar to DF.add_prefix, DF.add_suffix.

TypeBlocks.drop()

Optimized implementation of drop implemented on TypeBlocks, permitting Frame.set_index() and Frame.set_index_hierarchy() better dropping.

Frame.from_delimited

New (or reused) implementations with superior performance.

`dtypes` kwarg for `from_*` constructors accepting mappings/iterables/arrays

When using Frame.from_records, I found myself wishing I could pass a mapping of column name to dtype for the dtypes` kwarg.

Frame.drop()

Permit independent index, columns arguments, like Frame.reindex(); implement with TypeBlocks._drop_blocks().

Using `header_is_columns` on `sf.Frame.from_csv` causes exception

A minimal example:

from io import StringIO

import static_frame as sf

INPUT_STR = '''
196412	0.0
196501	0.0
196502	0.0
196503	0.0
196504	0.0
196505	0.0'''

if __name__ == '__main__':
    works_but_wrong_headers = sf.Frame.from_tsv(StringIO(INPUT_STR), index_column=0)
    desired_but_fails = sf.Frame.from_tsv(StringIO(INPUT_STR), index_column=0, header_is_columns=False)

The exception I observe:

Traceback (most recent call last):
  File "/home/<username>/sf_test_case.py", line 15, in <module>
    desired_but_fails = sf.Frame.from_tsv(StringIO(INPUT_STR), index_column=0, header_is_columns=False)
  File "/home/<username>/src/rapc/static_frame/static_frame/core/frame.py", line 713, in from_tsv
    return cls.from_csv(fp, delimiter='\t', **kwargs)
  File "/home/<username>/src/rapc/static_frame/static_frame/core/frame.py", line 696, in from_csv
    missing_values={''},
  File "/home/<username>/.env37/lib/python3.7/site-packages/numpy/lib/npyio.py", line 1771, in genfromtxt
    names = list(names)
TypeError: 'bool' object is not iterable

"groupby" aggregation on index hierarchy levels

I have a static_frame.Series, like so:

import static_frame as sf
import numpy as np
    
colors = ('red', 'green')
shapes = ('square', 'circle', 'triangle')
    
series = sf.Series(np.arange(6), index=sf.IndexHierarchy.from_product(shapes, colors))

I want to be able to perform some "groupby" function, like summing all like-color values together into a new Series that looks something like this:

<IndexHierarchy>       <Series>
red   6
green 9
<<U8> <int64>

With a pandas.Series, I could do this:

agg_series = series.groupby(level=1).sum()

However as far as I can tell, there's no way to do this with static_frame objects (without building the result manually). The closest functionality I can find is the group iterator functions, but those don't let you group by index hierarchy levels.

I understand that static_frame is not intended as a drop-in replacement for pandas, but this seems like a desirable feature for any data-crunching library.

Frame.extend() supports Series

This must follow the addition of a name attribute, as given only a Series we need a column name; for now, extend_items() can be used.

make `Series`, `Frame` hashable

Since static_frame objects are immutable, is there any reason they can't (or shouldn't) implement __hash__?

Static Typing

I nominate mypy, since it's another project I'm becoming more involved with. I believe this will prove useful as StaticFrame begins interfacing with C code internally. Ideally, we'd use it as a CI job too.

The following modules pass mypy --strict:

Series.to_pickle(), Series.from_pickle()

Support Index objects in getitem calls.

Series.fillna_leading(), Series.fillna_trailing(), Series.fillna_backward(), Series.fillna_forward()

Family of functions for more refined NaN filling.

Frame.clip()

Analog to Series.clip(), but with boundary objects will need to be Frames, not Series.

IndexHierarchy.drop_level()

Implement dropping from root side, in addition from leaves.
Reverse notation and default to drop from root side.

Working with time-series data?

Hi, got my interest peaked by your interview with Micheal Kennedy at Talk Python to Me. The library looks very interesting, especially the focus on removing ambiguity.

Most of my pandas work involve handling time-series data. Specifically of sensor data where the time between isn’t at all constant. I do lots of grouping by various time intervals, and time-aware interpolations.

After an admittedly cursory glance I didn’t see how I would do that kind of work with static-frame. Am I missing something, or haven’t you had reason/opportunity to consider that kind of use case?

f = sf.Frame([1, 2, 3], columns=['a'], index=sf.Index(range(3), name='Important Name'))
f
<Frame>
<Index>                 a       <<U1>
<Index: Important Name>
0                       1
1                       2
2                       3
<int64>                 <int64>

f.to_csv('/tmp/tmp.txt')
sf.Frame.from_csv('/tmp/tmp.txt')
<Frame>
<Index> index   a       <<U5>
<Index>
0       0       1
1       1       2
2       2       3
<int64> <int64> <int64>

.to_pandas(), .from_pandas() methods support sf.IndexHierarchy to pd.MultiIndex

Series.dropna() not working as expected on hierarchical-index series

MVCE:

import static_frame as sf
import numpy as np

f = sf.Series(np.nan, index=sf.IndexHierarchy.from_product(['A', 'B'], [1, 2]))
f.dropna()

The output will be the unmodified series, with NaNs still present. Surprisingly, Series.fillna() works correctly as expected.

static-frame / static-frame Goto Github PK

static-frame's Issues

On the c branch:

On the master branch:

Recommend Projects

Recommend Topics

Recommend Org

On the `c` branch:

On the `master` branch: