autonomio / astetik Goto Github PK

Astetik takes away the pain from telling visual stories with data on Python

License: MIT License

Python 98.50% HTML 1.27% Shell 0.22%

visualization matplotlib seaborn data-science pandas descriptive-statistics jupyter

astetik's Introduction

Autonomio provides a very high level abstraction layer for rapidly testing research ideas and instantly creating neural network based decision making models. Autonomio is built on top of Keras, using Tensorflow as a backend and spaCy for word vectorization. Autonomio brings deep learning and state-of-the-art linguistic processing accessible to anyone with basic computer skills. This document focus on an overview of Autonomio's capabilities.

If you want something higher level visit the website.

Getting Started

The simplest way is to install with pip from the repo directly.

pip install git+https://github.com/autonomio/core-module.git

User Documentation

You can find a comprehensive user documentation with code examples here.

Contribution Guidelines

Contributions are most welcome, read more here.

Examples

capabilities overview link
data transformation link
hyperparameter search link

(more examples coming soon / dated 31st of July, 2017)

Key Features

intuitive single-command user interface
hyper parameter grid search
comprehensive automated data transformation
optimized for Jupyter notebook use
NN shape selection and other unique configurations
create MLP, LSTM and Regression models
seamlessly integrates word2vec with Keras deep learning
interactive plots specifically designed for deep learning model evaluation

For most use cases successfully running a neural network works out of the box with zero configuration yielding a model that can be used to predict outcomes later.

Out-of-the-box use cases

Autonomio is the only deep learning workbench 100% focused on data science applications as opposed to perception problems (e.g. image detection), and have been used in a wide range of industrial and academic use cases.

Sentiment analysis
Social media account classification
Spam detection
Website classification
Fraud detection
Employee satisfaction evaluation
Popular Kaggle challenges (e.g. Titanic)

One line use examples

Training a model

First take care of the imports:

from autonomio.commands import train, predictor
%matplotlib inline

Then train the model:

train(x, y, data)

Training an LSTM model is even simpler:

train(x,model='lstm')

Making a prediction

predictor(data, saved_model_name)

Visualization

Standard Training Output

LSTM Training output

Hyperscan Output

Tested Systems

Autonomio have been tested in several Mac OSX and Ubuntu environments (both server and desktop). Travis builds use Ubuntu Precise.

Minimum Hardware

You need a machine with at least 4gb of memory if you want to do text processing, and othewrise 2gb is totally fine and 1gb might be ok. Actually very low spec AWS instance runs Autonomio just fine.

Recommended setup

For research and production environments we recommend one server with at least 4gb memory as a 'work station' and a separate instance with high-end CUDA supported GPU. The GPU instance costs roughly $1 per hour, and can be shut down when not used. As setting up the GPU station from ground can be a bit of a headache, we recommend using the AWS Machine Learning AMI to get setup quickly.

Dependencies

Major credits to all the contributors to these amazing packages. Autonomio would definitely not be possible without them.

astetik's People

Contributors

Stargazers

Watchers

Forkers

c0ntribute mcren88 meirm franaln

astetik's Issues

compare visual bug

If there are many values in label_col, the height of the graphic is not right.

KeyError: 'savefig.frameon is not a valid rc parameter (see rcParams.keys() for a list of valid parameters)'

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~/miniconda3/envs/wip/lib/python3.8/site-packages/matplotlib/__init__.py in __setitem__(self, key, val)
    676             try:
--> 677                 cval = self.validate[key](val)
    678             except ValueError as ve:

KeyError: 'savefig.frameon'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
<ipython-input-11-3c6b5ba0f4ae> in <module>
----> 1 a.plot_box(t.data.activation, t.data.val_f1score)

~/dev/talos/talos/commands/analyze.py in plot_box(self, x, y, hue)
    137         try:
    138             import astetik as ast
--> 139             return ast.box(self.data, x, y, hue)
    140         except RuntimeError:
    141             print('Matplotlib Runtime Error. Plots will not work.')

~/miniconda3/envs/wip/lib/python3.8/site-packages/astetik/plots/box.py in box(data, x, y, hue, palette, style, dpi, title, sub_title, x_label, y_label, legend, x_scale, y_scale, x_limit, y_limit, save)
    114 
    115     # HEADER STARTS >>>
--> 116     palette = _header(palette, style, n_colors=n, dpi=dpi)  # NOTE: y exception
    117     # <<< HEADER ENDS
    118     p, ax = plt.subplots(figsize=(params()['fig_width'],

~/miniconda3/envs/wip/lib/python3.8/site-packages/astetik/style/template.py in _header(palette, style, n_colors, dpi, fig_width, fig_height)
     48     style_dic = styles(dpi)
     49     for key in style_dic.keys():
---> 50         rcParams[key] = style_dic[key]
     51 
     52     return palette

~/miniconda3/envs/wip/lib/python3.8/site-packages/matplotlib/__init__.py in __setitem__(self, key, val)
    680             dict.__setitem__(self, key, cval)
    681         except KeyError as err:
--> 682             raise KeyError(
    683                 f"{key} is not a valid rc parameter (see rcParams.keys() for "
    684                 f"a list of valid parameters)") from err

KeyError: 'savefig.frameon is not a valid rc parameter (see rcParams.keys() for a list of valid parameters)'

bar() does not handle thousand separators

This should be included in every plot where it makes sense.

get coverage to >95%

The title says it all.

Random function is not working for tranforms

Several plots use _groubpy function in /utils/transforms.py to groupby various functions, including random, but that's not working.

scaling needs to be checked for the line plot

Maybe in general scaling (to log and so forth) would be better to have two options; one where it's done on the plot level after creating the axes, and the other where it's done on the data before.

unhashable type: '_ColorPalette'

Full code and trace

https://colab.research.google.com/drive/1XBMF3tOQwRLuPrGJib-YkN20_9nmMVT4#scrollTo=Cgi6jl_6R6fj&line=8&uniqifier=1

legends need to have a heading

most of the plots do not have a heading for the legend, which might cause confusion in many thinkable cases.

for the tables better to allow any metric

Instead of the fixed metrics right now (mean, std, total...) better to allow the user to choose which metrics they like.

handling of long x labels and legend labels

There seems to be three options that the user should have:

truncate (to a fixed length)
rotate (does not work for legends)
insert manually shorter ones (already added to line())

OLS should allow more features

Instead of just the current 3 dv, there should be a way to add as many as one likes. The other option is to drop OLS entirely.

combine redundant codes in /utils/transforms.py

The groupby code can be made in to one function that serves both purposes.

y_limit not working in hist()

maybe other plot too, but at least not in hist()

x_limit and y_limit needs to be None for all plots by default

right now some of the plots have it as 'auto' even though it does not make sense to limit x for example when it's used for category labels (e.g. box())

sorting is not working in bar()

I suppose that this has to do with the way the 'x' and 'y' are handled (and possibly confused with each other).

scale is not working when input is list

If line() gets multiple values in a list, scale is not working.

box plot labels not showing right in some cases

With the patient_data.csv file the x-labels are not showing correctly. Just integers are shown.

can't import astetik without internet connection

When doing import astetik the issue is with countries.csv. Will have to look for a better way to do this.

Count Plot is not allowing a boolean variable

For some reason the seaborn backend return an error for this.

Change Legend location and/or Legend Visualization in bargrid-function

Hey,

first of all, thanks for creating and supporting talos/astetik! I came to astetik via Talos and both tools are highly useful in machine learning for someone not that advanced in coding like me.

While creating bargrid-plots i noticed, that the legend of the hue-variable often partially blocks some of the sub-plots (picture 1).

The docs to bargrid-function showed, that there was once a legend_position variable, which is currently commented out.

My question are:

if it´s possible to (re-)enable the setting of the legend-position?
or how/if it´s feasible to add an option with which it´s possible to change the legend-background to solid like you did in your machinelearningmastery article "Hyperparameter Optimization with Keras" (picture 2)

I think, that adding this functionality (back?) can greatly help the visualization, especially if you share it with other people.
If my questions are already implemented and just couldn´t find them i apologize. I´m quite new to python data visualization and problably didn´t notice the solutions.

Greetings

scat() plot legend could be more clear

Right now it's standing out well. I think it will be good to standardize the legend across all plots to be as it's done in line() currently.

allow two datasets in single histogram

ast.hist() should allow two overlapping datasets simultaneously.

pip install astetik is failing due to sklearn in requirements.txt instead of scikit-learn

pip install astetik is failing because it is unable to install the deprecated sklearn dependency. In the pypi page of sklearn, there is a notice to start using scikit-learn instead of sklearn .
Request to change this in the requirements.txt and setup.py

Error message of pip installation in a fresh python environment

pip install astetik
Collecting astetik
  Using cached astetik-1.13-py3-none-any.whl (5.4 MB)
Collecting wrangle
  Using cached wrangle-0.7.2-py3-none-any.whl (52 kB)
Collecting pandas
  Using cached pandas-2.0.0-cp310-cp310-win_amd64.whl (11.2 MB)
Collecting geonamescache
  Using cached geonamescache-1.5.0-py3-none-any.whl (26.4 MB)
Collecting seaborn
  Using cached seaborn-0.12.2-py3-none-any.whl (293 kB)
Collecting statsmodels
  Using cached statsmodels-0.13.5-cp310-cp310-win_amd64.whl (9.1 MB)
Collecting sklearn
  Using cached sklearn-0.0.post4.tar.gz (3.6 kB)
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [8 lines of output]
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "C:\Users\mathavraj.j\AppData\Local\Temp\pip-install-bz3pwm5u\sklearn_08f62083d18c41bc8e66d870ee553b3f\setup.py", line 10, in <module>
          LONG_DESCRIPTION = f.read()
        File "C:\Users\mathavraj.j\AppData\Local\Programs\HCLTech\AION\2.7.0.3\python\lib\encodings\cp1252.py", line 23, in decode
          return codecs.charmap_decode(input,self.errors,decoding_table)[0]
      UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 7: character maps to <undefined>
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.```

markerstyle is not working in line()

Also markersize should be added as an option.

Issue with images in docs

There seems to be some issue with the images on the main page of the docs:

I'm visualizing with Microsoft Edge 95.0.1020.53 64-bit

table should support any number of features

There should be a way to create the table dynamically in a way that allows any number of features to be included.

default palette needs to be tweaked with comparison of two sets in mind

Currently the second color for the default palette is very light. Needs to be darker.

add scaling for 'log' and 'symlog' for special cases

Some plots like multikde() are not using the standard scaling module. Such cases need to have a separate scaling function. Maybe it make sense to completely take out scaling from matplotlib and handle it all in numpy before passing the data. More options for scaling this way too.

Indetion of labels in analyze_object.correlate

Hello,

For some reason I get Indeted labels (different indention for different Scan runs). In the figure there is a lable indetion of the heatmap correlation figure for example.

How could this be solved?

Thank you

count() gets squashed with titles if just two values

This seems to be just the case when there are very few values.

autonomio / astetik Goto Github PK

astetik's Introduction

Getting Started

User Documentation

Contribution Guidelines

Examples

Key Features

Out-of-the-box use cases

One line use examples

Training a model

Making a prediction

Visualization

Standard Training Output

LSTM Training output

Hyperscan Output

Tested Systems

Minimum Hardware

Recommended setup

Dependencies

Data Manipulation

Word Processing

Deep Learning

Visualization

astetik's People

Contributors

Stargazers

Watchers

Forkers

astetik's Issues

Recommend Projects

Recommend Topics

Recommend Org