static-frame / static-frame Goto Github PK
View Code? Open in Web Editor NEWImmutable and statically-typeable DataFrames with runtime type and data validation
Home Page: https://staticframe.dev
License: Other
Immutable and statically-typeable DataFrames with runtime type and data validation
Home Page: https://staticframe.dev
License: Other
Support creating a Series from multiple Series, assuming non-overlapping indices.
Support assigning types for multiple columns at once on TypeBlocks, so as to support consolidating (or isolating) blocks in TypeBlocks.
Possibly support getitem style selections on TypeBlocks on Frame, i.e., in a manner similar to assign():
>>> frame.astype[['b', 'g,' ,'f']](float)
>>> frame.astype['g':](bool)
By returning the keys of the Index, we can get a set-like object from the Index without creating a new container, useful in places where we, in a loop, OR multiple indices.
I want to make a Series
with an IndexHierarchy
index using Series.from_items
. (I like from_items
esp. for hard-coded data because it puts the indices directly next to their respective values, which is much easier to keep track of when reading & debugging code.)
For example, I want to do this:
series = sf.Series.from_items([
(('one', 1), 'hello'),
(('two', 2), 'cya'),
])
What I get is this series:
<Series>
<Index>
('one', 1) hello
('two', 2) cya
<object> <<U5>
But what I want is this series:
<Series>
<IndexHierarchy>
one 1 hello
two 2 cya
<object> <object> <<U5>
So:
reindex_flat
?) If there isn't, would it be worth implementing?from_items
to interpret tuples as entries for an IndexHierarchy
, without breaking anything else?Permit rolling on one or both axis simultaneously. Implementation will be very similar to TypeBlocks._drop_blocks()
c
branch:setup.py
with name="static-frame"
(it's currently "C-SF", for testing)..travis.yml
to deploy off of the master
branch..travis.yml
to use the "real" PyPI..travis.yml
with @flexatone's PyPI credentials (see me for this)..travis.yml
to deploy to PyPI only on tags. I'd rather just deploy whenever the version changes in setup.py
(which is what it does now), but we can chat about that.master
branch:c
.I listened to the TalkPython episode today that covered this library and I realized that this is the library I was looking for about a year ago. I have implemented parts of this library several times trying to address the pain points it tries to solve, so thank you!
Is the Apache Arrow project on your radar at all? It is very much aligned with the goals of this library and would provide several other features for free.
Permit modifying an IndexHierarchy at a specified, integer depth.
Analog to Frame.drop(); implement with np.delete().
Explore options for exporting to xarray.
In : f = sf.FrameGO(dict(color=('black',)))
In : f['color']
Traceback (most recent call last):
File "/usr/lib/python3.5/code.py", line 91, in runcode
exec(code, self.locals)
File "<console>", line 1, in <module>
File "/home/ariza/src/rapc/static_frame/static_frame/core/frame.py", line 1109, in __getitem__
return self._extract(*self._compound_loc_to_getitem_iloc(key))
File "/home/ariza/src/rapc/static_frame/static_frame/core/frame.py", line 1049, in _extract
return Series(blocks.values[0], index=index)
File "/home/ariza/src/rapc/static_frame/static_frame/core/series.py", line 160, in __init__
if len(self.values) != shape:
TypeError: len() of unsized object
For examples:
sf.Frame.from_records([
('one', 1, 'hello'),
('two', 2, 'cya'),
], columns=['name', 'val', 'msg']).set_index_hierarchy(['name', 'val'], drop=True)
(correctly afaik) produces this frame:
<Frame>
<Index> msg <<U4>
<Index...
one 1 hello
two 2 cya
<object> <object> <<U5>
But if I remove the second row, like so:
sf.Frame.from_records([
('one', 1, 'hello'),
], columns=['name', 'val', 'msg']).set_index_hierarchy(['name', 'val'], drop=True)
this fails with the following traceback:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.5/dist-packages/static_frame/core/frame.py", line 2083, in set_index_hierarchy
index = IndexHierarchy.from_labels(index_labels, name=column_name)
File "/usr/local/lib/python3.5/dist-packages/static_frame/core/index_hierarchy.py", line 586, in from_labels
for d, v in enumerate(label):
TypeError: 'int' object is not iterable
One detail to note is that it seems to fail on the first non-iterable set_index_hierarchy
column value; it just so happens that the first is 'one' which is iterable.
Is this in some way an intended behavior?
a = sf.Series((1, 2, 3))
fg = a.to_frame_go(axis=0)
fg['b'] = 'b'
Traceback (most recent call last):
File "/home/rutherford/src/rapc/static_frame/static_frame/core/frame.py", line 2561, in __setitem__
self._columns.append(key)
builtins.AttributeError: 'Index' object has no attribute 'append'
Support pivot style transformation of Frame.
Similar to DF.add_prefix, DF.add_suffix.
Optimized implementation of drop implemented on TypeBlocks, permitting Frame.set_index() and Frame.set_index_hierarchy() better dropping.
New (or reused) implementations with superior performance.
When using Frame.from_records, I found myself wishing I could pass a mapping of column name to dtype for the
dtypes` kwarg.
Permit independent index, columns arguments, like Frame.reindex(); implement with TypeBlocks._drop_blocks().
A minimal example:
from io import StringIO
import static_frame as sf
INPUT_STR = '''
196412 0.0
196501 0.0
196502 0.0
196503 0.0
196504 0.0
196505 0.0'''
if __name__ == '__main__':
works_but_wrong_headers = sf.Frame.from_tsv(StringIO(INPUT_STR), index_column=0)
desired_but_fails = sf.Frame.from_tsv(StringIO(INPUT_STR), index_column=0, header_is_columns=False)
The exception I observe:
Traceback (most recent call last):
File "/home/<username>/sf_test_case.py", line 15, in <module>
desired_but_fails = sf.Frame.from_tsv(StringIO(INPUT_STR), index_column=0, header_is_columns=False)
File "/home/<username>/src/rapc/static_frame/static_frame/core/frame.py", line 713, in from_tsv
return cls.from_csv(fp, delimiter='\t', **kwargs)
File "/home/<username>/src/rapc/static_frame/static_frame/core/frame.py", line 696, in from_csv
missing_values={''},
File "/home/<username>/.env37/lib/python3.7/site-packages/numpy/lib/npyio.py", line 1771, in genfromtxt
names = list(names)
TypeError: 'bool' object is not iterable
I have a static_frame.Series
, like so:
import static_frame as sf
import numpy as np
colors = ('red', 'green')
shapes = ('square', 'circle', 'triangle')
series = sf.Series(np.arange(6), index=sf.IndexHierarchy.from_product(shapes, colors))
I want to be able to perform some "groupby" function, like summing all like-color values together into a new Series
that looks something like this:
<IndexHierarchy> <Series>
red 6
green 9
<<U8> <int64>
With a pandas.Series
, I could do this:
agg_series = series.groupby(level=1).sum()
However as far as I can tell, there's no way to do this with static_frame
objects (without building the result manually). The closest functionality I can find is the group iterator functions, but those don't let you group by index hierarchy levels.
I understand that static_frame
is not intended as a drop-in replacement for pandas
, but this seems like a desirable feature for any data-crunching library.
This must follow the addition of a name
attribute, as given only a Series we need a column name; for now, extend_items()
can be used.
Since static_frame
objects are immutable, is there any reason they can't (or shouldn't) implement __hash__
?
I nominate mypy
, since it's another project I'm becoming more involved with. I believe this will prove useful as StaticFrame begins interfacing with C code internally. Ideally, we'd use it as a CI job too.
The following modules pass mypy --strict
:
doc/doc_build.py
doc/source/conf.py
setup.py
static_frame/__init__.py
static_frame/__main__.py
static_frame/core/__init__.py
static_frame/core/array_go.py
static_frame/core/display.py
static_frame/core/display_color.py
static_frame/core/display_html_datatables.py
static_frame/core/doc_str.py
static_frame/core/frame.py
static_frame/core/hloc.py
static_frame/core/index.py
static_frame/core/index_base.py
static_frame/core/index_hierarchy.py
static_frame/core/index_level.py
static_frame/core/iter_node.py
static_frame/core/operator_delegate.py
static_frame/core/series.py
static_frame/core/type_blocks.py
static_frame/core/util.py
static_frame/performance/__init__.py
static_frame/performance/core.py
static_frame/performance/main.py
static_frame/performance/perf_test.py
static_frame/performance/pydata_2018.py
static_frame/test/__init__.py
static_frame/test/property/__init__.py
static_frame/test/property/strategies.py
static_frame/test/property/test_index.py
static_frame/test/property/test_strategies.py
static_frame/test/property/test_type_blocks.py
static_frame/test/property/test_util.py
static_frame/test/test_case.py
static_frame/test/unit/__init__.py
static_frame/test/unit/test_array_go.py
static_frame/test/unit/test_display.py
static_frame/test/unit/test_display_color.py
static_frame/test/unit/test_doc.py
static_frame/test/unit/test_frame.py
static_frame/test/unit/test_index.py
static_frame/test/unit/test_index_hierarchy.py
static_frame/test/unit/test_index_level.py
static_frame/test/unit/test_main.py
static_frame/test/unit/test_series.py
static_frame/test/unit/test_type_blocks.py
static_frame/test/unit/test_util.py
Family of functions for more refined NaN filling.
Analog to Series.clip(), but with boundary objects will need to be Frames, not Series.
Implement dropping from root side, in addition from leaves.
Reverse notation and default to drop from root side.
Hi, got my interest peaked by your interview with Micheal Kennedy at Talk Python to Me. The library looks very interesting, especially the focus on removing ambiguity.
Most of my pandas work involve handling time-series data. Specifically of sensor data where the time between isn’t at all constant. I do lots of grouping by various time intervals, and time-aware interpolations.
After an admittedly cursory glance I didn’t see how I would do that kind of work with static-frame. Am I missing something, or haven’t you had reason/opportunity to consider that kind of use case?
While basic pickling appears to work, post-pickle-loaded NumPy arrays have flags.writeable
set to True
. Might need appropriate pickle handlers defined for copyreg
moduie.
Permit creating a hierarchical index from columns already found in a Frame. There is no analog for Series, and have specialized method is better than overloading the behavior of Frame.set_index().
Extending Display to support HTML output, implement _repr_html_
methods on containers.
Note that the index name is lost when saving to csv in the below example.
f = sf.Frame([1, 2, 3], columns=['a'], index=sf.Index(range(3), name='Important Name'))
f
<Frame>
<Index> a <<U1>
<Index: Important Name>
0 1
1 2
2 3
<int64> <int64>
f.to_csv('/tmp/tmp.txt')
sf.Frame.from_csv('/tmp/tmp.txt')
<Frame>
<Index> index a <<U5>
<Index>
0 0 1
1 1 2
2 2 3
<int64> <int64> <int64>
MVCE:
import static_frame as sf
import numpy as np
f = sf.Series(np.nan, index=sf.IndexHierarchy.from_product(['A', 'B'], [1, 2]))
f.dropna()
The output will be the unmodified series, with NaNs still present. Surprisingly, Series.fillna() works correctly as expected.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.