bjherger / keras-pandas Goto Github PK

View Code? Open in Web Editor NEW

57.0 57.0 14.0 2.81 MB

keras-pandas allows users to rapidly build and iterate on deep learning models.

License: MIT License

Python 100.00%

keras-pandas's People

Contributors

Stargazers

Watchers

Forkers

earlbabson bigdong89 radovankavicky gapdata thicolares gissong carsondahlberg timdaviesatco jordanosborn deanwronowski tanguyurvoy yuhhuang ulrictaylor simvolick

keras-pandas's Issues

Add support for numeric data types

Acceptance criteria:

Numeric data types have null handling
Numeric data types are normalized
Automater can appropriately transform numeric-only dataframes
Automater can produce input nubs for numeric-only dataframes

Spaces and parentheses in columns cause "not a valid scope name"

If Dataframe column names have spaces or parentheses, the fit() function raises an exception:

import pandas
import keras_pandas.Automater

data_good = pandas.DataFrame({'length': [1.0]})
data_bad = pandas.DataFrame({'length (cm)': [1.0]})

auto_good = keras_pandas.Automater.Automater(numerical_vars=['length'])
auto_bad = keras_pandas.Automater.Automater(numerical_vars=['length (cm)'])

auto_good.fit(data_good)
auto_bad.fit(data_bad)

Full traceback:

>>> auto_bad.fit(data_bad)
/home/me/.virtualenvs/p3ml/lib/python3.6/site-packages/keras_pandas/constants.py:33: FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead.
  transformed = input_dataframe[variable].as_matrix()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/me/.virtualenvs/p3ml/lib/python3.6/site-packages/keras_pandas/Automater.py", line 92, in fit
    input_layers, input_nub = self._create_input_nub(self._variable_type_dict, input_variables_df)
  File "/home/me/.virtualenvs/p3ml/lib/python3.6/site-packages/keras_pandas/Automater.py", line 272, in _create_input_nub
    variable_input, variable_input_nub_tip = variable_type_handler(variable, input_dataframe)
  File "/home/me/.virtualenvs/p3ml/lib/python3.6/site-packages/keras_pandas/constants.py", line 42, in input_nub_numeric_handler
    input_layer = keras.Input(shape=(input_sequence_length,), dtype='float32', name='input_{}'.format(variable))
  File "/home/me/.virtualenvs/p3ml/lib/python3.6/site-packages/keras/engine/input_layer.py", line 178, in Input
    input_tensor=tensor)
  File "/home/me/.virtualenvs/p3ml/lib/python3.6/site-packages/keras/legacy/interfaces.py", line 91, in wrapper
    return func(*args, **kwargs)
  File "/home/me/.virtualenvs/p3ml/lib/python3.6/site-packages/keras/engine/input_layer.py", line 87, in __init__
    name=self.name)
  File "/home/me/.virtualenvs/p3ml/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 517, in placeholder
    x = tf.placeholder(dtype, shape=shape, name=name)
  File "/home/me/.virtualenvs/p3ml/lib/python3.6/site-packages/tensorflow/python/ops/array_ops.py", line 1745, in placeholder
    return gen_array_ops.placeholder(dtype=dtype, shape=shape, name=name)
  File "/home/me/.virtualenvs/p3ml/lib/python3.6/site-packages/tensorflow/python/ops/gen_array_ops.py", line 5020, in placeholder
    "Placeholder", dtype=dtype, shape=shape, name=name)
  File "/home/me/.virtualenvs/p3ml/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 394, in _apply_op_helper
    with g.as_default(), ops.name_scope(name) as scope:
  File "/home/me/.virtualenvs/p3ml/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 6040, in __enter__
    return self._name_scope.__enter__()
  File "/usr/lib/python3.6/contextlib.py", line 81, in __enter__
    return next(self.gen)
  File "/home/me/.virtualenvs/p3ml/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 4004, in name_scope
    raise ValueError("'%s' is not a valid scope name" % name)
ValueError: 'input_length (cm)' is not a valid scope name

I just fell over this using the Iris dataset from Sklearn. If it's an intended constraint it should be documented, but I suspect it's not.

This is using Python 3.6.5, keras-pandas 2.2.0, pandas 0.23.4.

Support for timestamp datatype

tensorflow signature

Acceptance criteria:

Understand tensorflow signature
Implement tensorflow signature for use w/ TF backend
Create example tf serving app, utilizing tf signature

Resources:

Alois

Boolean input handler

Boolean input handler should be the same as categorical, or boolean types should be removed

Thanks for the time you are putting to make life easier between pandas and keras!
I was checking out the code and the current issues and I saw that you are using the original keras module and not the version inside TensorFlow core.

Is there any reason in particular to this? If not, what would be your thoughts about changing this?

Setup CI/CD

Add links to README

Add following links to README:

Source Code
Documentation
PyPi registration
CI / travis (?)

Consistent variable -> var type mapper

There should be a single function, which:

Validates that the variable is in the set of available variables
Provides the variable type for that variable

Video tutorial

Create a screen capture tutorial, w/ voice over, showing how to use project.

Create examples folder

Check that response variable is in variable list

Address transient issue

Transiet unittest issue

======================================================================
FAIL: test_transform_no_response (testtext.TestText)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/build/bjherger/keras-pandas/tests/testtext.py", line 46, in test_transform_no_response
    self.assertCountEqual([2, 3, 4, 5], list(X[0][0]))
AssertionError: Element counts were not equal:
First has 1, Second has 0:  5
First has 0, Second has 1:  894

----------------------------------------------------------------------

Required packages for documentation generation

Contributing.md

Move contributing info from README to a separate contributing.md file

Examples

This project should have at least two examples

Change project name to keras-pandas

It's a bit catchier.

AC:

Change PyPI registration
Change repo name
Change internal descriptions

Add lending club example

https://www.lendingclub.com/info/download-data.action

Rename test files

Rename test files to be consistent w/ Google's Python Style Guide

Failing text test

Find and fix transient issue:

======================================================================
FAIL: test_transform_no_response (testtext.TestText)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/build/bjherger/keras-pandas/tests/testtext.py", line 46, in test_transform_no_response
    self.assertCountEqual([2, 3, 4, 5], list(X[0][0]))
AssertionError: Element counts were not equal:
First has 1, Second has 0:  5
First has 0, Second has 1:  43

----------------------------------------------------------------------
Ran 23 tests in 84.650s

Version numbering should be consistent

One variable should set version numbering in:

Keras-pandas
setup.py
conf.py for sphinx documentation.

One option might be: https://github.com/warner/python-versioneer, or SemVer

Add support for Categorical data types

Transformations

Categorical data types have null handling
Categorical data types are normalized
Automater can appropriately transform categorical-only dataframes

Modeling

Automater can produce input nubs for categorical-only dataframes
Automater can produce output nub for categorical-only dataframes

Code base should still meet existing unittests, including those for numeric data types.

Support more modern keras, tf and pandas versions

Travis CI/CD has issues with latest version of tensorflow.

Options include:

Testing if travis issue has been resolved
Manually resolving travis issue
Switching to another CI/CD platform

Time stamp support

Robust text input handling

Smartly handle non-string inputs
Smartly handle null inputs

Automated PyPi release

Perhaps via https://docs.travis-ci.com/user/deployment/pypi/

Setup.py requirements are strict

Setup.py requirements should allow for minor version differences, or otherwise allow for small version changes.

Support for variable list structure & checking

Acceptance criteria:

Confirm that variables appear in only one variable list
Add variable lists from __init__ to internal state
Implement _check_input_dataframe_columns_, to check that the input dataframe has the required columns

requirements.txt missing from manifest, setup.py fails

$ pip install keras-pandas
Collecting keras-pandas
  Using cached https://files.pythonhosted.org/packages/b6/4f/cd2e9c9d25024bc76d8806966bc128d4f24e37d7fb64d6ab8f7ed9422601/keras-pandas-1.3.3.tar.gz
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-install-v828ze62/keras-pandas/setup.py", line 12, in <module>
        with open('requirements.txt') as f:
    FileNotFoundError: [Errno 2] No such file or directory: 'requirements.txt'
    
    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-install-v828ze62/keras-pandas/

Fails on python 2.7 and 3.5.2

requirements.txt seems to be missing from the tarball. You probably need to explicitly create a MANIFEST.in to pick it up.

Add CI/CD PyPi links

README & setup.py should have CI/CD and PyPi links
README should have about author section (?)

Read the docs versioning

Read the docs seems to only capture latest. It would be helpful to keep documentation for all versions.

Create README

Create README file, with the following sections:

Quick start
Project purpose
Installation guide
Guiding principles
Contributing

Add build status to README.md

https://travis-ci.org/bjherger/keras-pandas

Timestamp output layer

Support for timestamp output type, via a default timestamp output layer

Probably just a single node dense layer, predicting the time in epoch

Transformations getter & response variable inverse transformation support

Ability to pull transformations for a given variable
Ability to inverse-transform where applicable, for supported output data types.

No Module 'sklearn_pandas' for Automater

When trying to replicate the example, I receive this error... Looks like it is not finding the class sklearn_pandas in Automator

from keras import Model
from keras.layers import Dense

from keras_pandas.Automater import Automater
from keras_pandas.lib import load_titanic

observations = load_titanic()

# Transform the data set, using keras_pandas
categorical_vars = ['pclass', 'sex', 'survived']
numerical_vars = ['age', 'siblings_spouses_aboard', 'parents_children_aboard', 'fare']
text_vars = ['name']

auto = Automater(categorical_vars=categorical_vars, numerical_vars=numerical_vars, text_vars=text_vars,
 response_var='survived')
X, y = auto.fit_transform(observations)

# Start model with provided input nub
x = auto.input_nub

# Fill in your own hidden layers
x = Dense(32)(x)
x = Dense(32, activation='relu')(x)
x = Dense(32)(x)

# End model with provided output nub
x = auto.output_nub(x)

model = Model(inputs=auto.input_layers, outputs=x)
model.compile(optimizer='Adam', loss=auto.loss, metrics=['accuracy'])

# Train model
model.fit(X, y, epochs=4, validation_split=.2)

The traceback:

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-509-d3a513b032e9> in <module>()
      2 from keras.layers import Dense
      3 
----> 4 from keras_pandas.Automater import Automater
      5 from keras_pandas.lib import load_titanic
      6 

~\Documents\python_hub\lib\site-packages\keras_pandas\Automater.py in <module>()
      5 from keras.engine import Layer
      6 from keras.layers import Concatenate, Dense
----> 7 from sklearn_pandas import DataFrameMapper
      8 
      9 from keras_pandas import constants, lib

ModuleNotFoundError: No module named 'sklearn_pandas'

Time series support

Build out time series support:

Follow new data type workflow, described in contributing.md
Can be based on text var handlers (No text preprocessing, similar padding, similar input nub)

Train / test / validate split
Examples for all supported data types
Add example requirement to new data type workflow in contributing.md

Current state

Required

Automater.init for passing in variable list
constants.py for default pipeline
constants.py for input handler
constants.py for input handler lookup

Optional: Output data type

constants.py Suggested loss
Automater._create_output_nub for creating an output nub
Automater.inverse_transform_output for inverse transforming the output data type.

Future state

This is absurd, and difficult to support / maintain. Another path might be to create an interface class, VariableTypeHandler, which includes the following methods:

init
default_transformation_pipeline
input_nub_generator
output_nub_generator (optional)
output_inverse_transform (optional)
output_suggested_loss (optional)

bjherger / keras-pandas Goto Github PK

keras-pandas's People

Contributors

Stargazers

Watchers

Forkers

keras-pandas's Issues

Current state

Future state

Recommend Projects

Recommend Topics

Recommend Org