bjherger / keras-pandas Goto Github PK
View Code? Open in Web Editor NEWkeras-pandas allows users to rapidly build and iterate on deep learning models.
License: MIT License
keras-pandas allows users to rapidly build and iterate on deep learning models.
License: MIT License
Acceptance criteria:
If Dataframe column names have spaces or parentheses, the fit()
function raises an exception:
import pandas
import keras_pandas.Automater
data_good = pandas.DataFrame({'length': [1.0]})
data_bad = pandas.DataFrame({'length (cm)': [1.0]})
auto_good = keras_pandas.Automater.Automater(numerical_vars=['length'])
auto_bad = keras_pandas.Automater.Automater(numerical_vars=['length (cm)'])
auto_good.fit(data_good)
auto_bad.fit(data_bad)
Full traceback:
>>> auto_bad.fit(data_bad)
/home/me/.virtualenvs/p3ml/lib/python3.6/site-packages/keras_pandas/constants.py:33: FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead.
transformed = input_dataframe[variable].as_matrix()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/me/.virtualenvs/p3ml/lib/python3.6/site-packages/keras_pandas/Automater.py", line 92, in fit
input_layers, input_nub = self._create_input_nub(self._variable_type_dict, input_variables_df)
File "/home/me/.virtualenvs/p3ml/lib/python3.6/site-packages/keras_pandas/Automater.py", line 272, in _create_input_nub
variable_input, variable_input_nub_tip = variable_type_handler(variable, input_dataframe)
File "/home/me/.virtualenvs/p3ml/lib/python3.6/site-packages/keras_pandas/constants.py", line 42, in input_nub_numeric_handler
input_layer = keras.Input(shape=(input_sequence_length,), dtype='float32', name='input_{}'.format(variable))
File "/home/me/.virtualenvs/p3ml/lib/python3.6/site-packages/keras/engine/input_layer.py", line 178, in Input
input_tensor=tensor)
File "/home/me/.virtualenvs/p3ml/lib/python3.6/site-packages/keras/legacy/interfaces.py", line 91, in wrapper
return func(*args, **kwargs)
File "/home/me/.virtualenvs/p3ml/lib/python3.6/site-packages/keras/engine/input_layer.py", line 87, in __init__
name=self.name)
File "/home/me/.virtualenvs/p3ml/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 517, in placeholder
x = tf.placeholder(dtype, shape=shape, name=name)
File "/home/me/.virtualenvs/p3ml/lib/python3.6/site-packages/tensorflow/python/ops/array_ops.py", line 1745, in placeholder
return gen_array_ops.placeholder(dtype=dtype, shape=shape, name=name)
File "/home/me/.virtualenvs/p3ml/lib/python3.6/site-packages/tensorflow/python/ops/gen_array_ops.py", line 5020, in placeholder
"Placeholder", dtype=dtype, shape=shape, name=name)
File "/home/me/.virtualenvs/p3ml/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 394, in _apply_op_helper
with g.as_default(), ops.name_scope(name) as scope:
File "/home/me/.virtualenvs/p3ml/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 6040, in __enter__
return self._name_scope.__enter__()
File "/usr/lib/python3.6/contextlib.py", line 81, in __enter__
return next(self.gen)
File "/home/me/.virtualenvs/p3ml/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 4004, in name_scope
raise ValueError("'%s' is not a valid scope name" % name)
ValueError: 'input_length (cm)' is not a valid scope name
I just fell over this using the Iris dataset from Sklearn. If it's an intended constraint it should be documented, but I suspect it's not.
This is using Python 3.6.5, keras-pandas 2.2.0, pandas 0.23.4.
Acceptance criteria:
Resources:
Boolean input handler should be the same as categorical, or boolean types should be removed
Hi,
Thanks for the time you are putting to make life easier between pandas and keras!
I was checking out the code and the current issues and I saw that you are using the original keras module and not the version inside TensorFlow core.
Is there any reason in particular to this? If not, what would be your thoughts about changing this?
Add following links to README:
There should be a single function, which:
Create a screen capture tutorial, w/ voice over, showing how to use project.
Transiet unittest issue
======================================================================
FAIL: test_transform_no_response (testtext.TestText)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/travis/build/bjherger/keras-pandas/tests/testtext.py", line 46, in test_transform_no_response
self.assertCountEqual([2, 3, 4, 5], list(X[0][0]))
AssertionError: Element counts were not equal:
First has 1, Second has 0: 5
First has 0, Second has 1: 894
----------------------------------------------------------------------
Move contributing info from README to a separate contributing.md
file
This project should have at least two examples
It's a bit catchier.
AC:
Rename test files to be consistent w/ Google's Python Style Guide
Find and fix transient issue:
======================================================================
FAIL: test_transform_no_response (testtext.TestText)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/travis/build/bjherger/keras-pandas/tests/testtext.py", line 46, in test_transform_no_response
self.assertCountEqual([2, 3, 4, 5], list(X[0][0]))
AssertionError: Element counts were not equal:
First has 1, Second has 0: 5
First has 0, Second has 1: 43
----------------------------------------------------------------------
Ran 23 tests in 84.650s
One variable should set version numbering in:
One option might be: https://github.com/warner/python-versioneer, or SemVer
Transformations
Modeling
Code base should still meet existing unittests, including those for numeric data types.
Travis CI/CD has issues with latest version of tensorflow.
Options include:
Perhaps via https://docs.travis-ci.com/user/deployment/pypi/
Setup.py requirements should allow for minor
version differences, or otherwise allow for small version changes.
Acceptance criteria:
__init__
to internal state_check_input_dataframe_columns_
, to check that the input dataframe has the required columns$ pip install keras-pandas
Collecting keras-pandas
Using cached https://files.pythonhosted.org/packages/b6/4f/cd2e9c9d25024bc76d8806966bc128d4f24e37d7fb64d6ab8f7ed9422601/keras-pandas-1.3.3.tar.gz
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/tmp/pip-install-v828ze62/keras-pandas/setup.py", line 12, in <module>
with open('requirements.txt') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'requirements.txt'
----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-install-v828ze62/keras-pandas/
Fails on python 2.7 and 3.5.2
requirements.txt seems to be missing from the tarball. You probably need to explicitly create a MANIFEST.in to pick it up.
Read the docs seems to only capture latest
. It would be helpful to keep documentation for all versions.
Create README file, with the following sections:
Support for timestamp output type, via a default timestamp output layer
When trying to replicate the example, I receive this error... Looks like it is not finding the class sklearn_pandas in Automator
from keras import Model
from keras.layers import Dense
from keras_pandas.Automater import Automater
from keras_pandas.lib import load_titanic
observations = load_titanic()
# Transform the data set, using keras_pandas
categorical_vars = ['pclass', 'sex', 'survived']
numerical_vars = ['age', 'siblings_spouses_aboard', 'parents_children_aboard', 'fare']
text_vars = ['name']
auto = Automater(categorical_vars=categorical_vars, numerical_vars=numerical_vars, text_vars=text_vars,
response_var='survived')
X, y = auto.fit_transform(observations)
# Start model with provided input nub
x = auto.input_nub
# Fill in your own hidden layers
x = Dense(32)(x)
x = Dense(32, activation='relu')(x)
x = Dense(32)(x)
# End model with provided output nub
x = auto.output_nub(x)
model = Model(inputs=auto.input_layers, outputs=x)
model.compile(optimizer='Adam', loss=auto.loss, metrics=['accuracy'])
# Train model
model.fit(X, y, epochs=4, validation_split=.2)
The traceback:
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
<ipython-input-509-d3a513b032e9> in <module>()
2 from keras.layers import Dense
3
----> 4 from keras_pandas.Automater import Automater
5 from keras_pandas.lib import load_titanic
6
~\Documents\python_hub\lib\site-packages\keras_pandas\Automater.py in <module>()
5 from keras.engine import Layer
6 from keras.layers import Concatenate, Dense
----> 7 from sklearn_pandas import DataFrameMapper
8
9 from keras_pandas import constants, lib
ModuleNotFoundError: No module named 'sklearn_pandas'
Build out time series support:
contributing.md
It's time
Add ability for CategoricalImputer
to replace new labels w/ some sentinel value.
Create a read the docs page
Consistent examples, including:
contributing.md
Categorical imputer that fills missing values w/ 'UNK'
Currently, a single datatype will have information in multiple locations:
Required
Optional: Output data type
This is absurd, and difficult to support / maintain. Another path might be to create an interface class, VariableTypeHandler
, which includes the following methods:
Adding python 3.7 to .travis.yml
options
Choose and add a license
If the response variable isn't in the variable list, kp should raise a smart error message
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.