Giter Club home page Giter Club logo

open_source_demos's Introduction

Alteryx Open Source Demos

This repository consists of a series of demos that leverage EvalML, Featuretools, Woodwork, and Compose. The demos rely on a different subset of these libraries and to various levels of complexity.

Building an accurate machine learning model requires several important steps. One of the most complex and time consuming is extracting information through features. Finding the right features is a crucial component of both interpreting the dataset as a whole as well as building a model with great predictive power. Another core component of any machine learning process is selecting the right estimator to use for the problem at hand. By combining the best features with the most accurate estimator and its corresponding hyperparameters, we can build a machine learning model that can generalize well to unknown data. Just as the process of feature engineering is made simple by Featuretools, we have made automated machine learning easy to implement using EvalML.

Running these tutorials

  1. Clone the repository.

    git clone https://github.com/alteryx/open-source-demos

  2. Install the requirements. It's recommended to create a new environment to work in to install these libraries separately.

    pip install -r requirements.txt

    In order to properly execute the demos, please install Graphviz according to the Featuretools documentation.

  3. Download the data.

    You can download the data for each demo by following the instructions in each tutorial. The dataset will usually be kept in a folder named data within the project structure.

  4. The tutorials can be run in Jupyter Notebook.

    jupyter notebook

Built at Alteryx Innovation Labs

Alteryx Innovation Labs

open_source_demos's People

Contributors

bschreck avatar bukosabino avatar chriskaschner avatar gsheni avatar jeff-hernandez avatar kmax12 avatar parthivnaresh avatar prateekmantha avatar realxujiang avatar rwedge avatar seth-rothschild avatar thehomebrewnerd avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

open_source_demos's Issues

data leakage in predict_next_purchases

Hey,

in the notebook, when using:

clf.fit(X, y)
top_features = utils.feature_importances(clf, features_encoded, n=20)

we introduce data leakage, since we select the features on the whole data set.
With scikit-learn's pipelines it's possible to select the 20 (rather a fraction o features) best features for each fold with select from model

With the current set up, you are probably overestimating the AUC.
Besides, cross val score assumes IID samples. However, this will clearly not be the case, since one entity has typically several occurences. I think some thing like time series split or rather an adaption (since we don't have time series in a classical way but rather time slices) should be the correct thing to use here.

Comments on those issues?

Currently, at work, I have the same issues, so I really appreciate the library you developed so far. I haven't seen something similar so far. So thumbs up in any case

module 'dask' has no attribute 'config'

Traceback (most recent call last): File "process_data.py", line 1, in <module> import featuretools as ft File "/Users/trinadhimmedisetty/anaconda3/lib/python3.6/site-packages/featuretools-0.2.1-py3.6.egg/featuretools/__init__.py", line 7, in <module> from .synthesis.api import * File "/Users/trinadhimmedisetty/anaconda3/lib/python3.6/site-packages/featuretools-0.2.1-py3.6.egg/featuretools/synthesis/__init__.py", line 3, in <module> from .api import * File "/Users/trinadhimmedisetty/anaconda3/lib/python3.6/site-packages/featuretools-0.2.1-py3.6.egg/featuretools/synthesis/api.py", line 5, in <module> from .dfs import dfs File "/Users/trinadhimmedisetty/anaconda3/lib/python3.6/site-packages/featuretools-0.2.1-py3.6.egg/featuretools/synthesis/dfs.py", line 5, in <module> from featuretools.computational_backends import calculate_feature_matrix File "/Users/trinadhimmedisetty/anaconda3/lib/python3.6/site-packages/featuretools-0.2.1-py3.6.egg/featuretools/computational_backends/__init__.py", line 2, in <module> from .api import * File "/Users/trinadhimmedisetty/anaconda3/lib/python3.6/site-packages/featuretools-0.2.1-py3.6.egg/featuretools/computational_backends/api.py", line 2, in <module> from .calculate_feature_matrix import ( File "/Users/trinadhimmedisetty/anaconda3/lib/python3.6/site-packages/featuretools-0.2.1-py3.6.egg/featuretools/computational_backends/calculate_feature_matrix.py", line 17, in <module> from .utils import ( File "/Users/trinadhimmedisetty/anaconda3/lib/python3.6/site-packages/featuretools-0.2.1-py3.6.egg/featuretools/computational_backends/utils.py", line 9, in <module> from distributed import Client, LocalCluster File "/Users/trinadhimmedisetty/anaconda3/lib/python3.6/site-packages/distributed/__init__.py", line 3, in <module> from . import config File "/Users/trinadhimmedisetty/anaconda3/lib/python3.6/site-packages/distributed/config.py", line 13, in <module> config = dask.config.config AttributeError: module 'dask' has no attribute 'config'

File Not Found Error

While I run the Tutorial.ipynb in my local machine, I got a File Not Found Error on fms_out = feature_matrices.compute():

[                                        ] | 0% Completed |  0.1s

FileNotFoundError                         Traceback (most recent call last)
<ipython-input-14-427963b5bcd5> in <module>()
----> 1 fms_out = feature_matrices.compute()
      2 X = pd.concat(fms_out)

~/miniconda3/lib/python3.6/site-packages/dask/base.py in compute(self, **kwargs)
    154         dask.base.compute
    155         """
--> 156         (result,) = compute(self, traverse=False, **kwargs)
    157         return result
    158 

~/miniconda3/lib/python3.6/site-packages/dask/base.py in compute(*args, **kwargs)
    393     keys = [x.__dask_keys__() for x in collections]
    394     postcomputes = [x.__dask_postcompute__() for x in collections]
--> 395     results = schedule(dsk, keys, **kwargs)
    396     return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])
    397 

~/miniconda3/lib/python3.6/site-packages/dask/multiprocessing.py in get(dsk, keys, num_workers, func_loads, func_dumps, optimize_graph, **kwargs)
    170                            get_id=_process_get_id, dumps=dumps, loads=loads,
    171                            pack_exception=pack_exception,
--> 172                            raise_exception=reraise, **kwargs)
    173     finally:
    174         if cleanup:

~/miniconda3/lib/python3.6/site-packages/dask/local.py in get_async(apply_async, num_workers, dsk, result, cache, get_id, rerun_exceptions_locally, pack_exception, raise_exception, callbacks, dumps, loads, **kwargs)
    499                         _execute_task(task, data)  # Re-execute locally
    500                     else:
--> 501                         raise_exception(exc, tb)
    502                 res, worker_id = loads(res_info)
    503                 state['cache'][key] = res

~/miniconda3/lib/python3.6/site-packages/dask/compatibility.py in reraise(exc, tb)
    109     def reraise(exc, tb=None):
    110         if exc.__traceback__ is not tb:
--> 111             raise exc.with_traceback(tb)
    112         raise exc
    113 

~/miniconda3/lib/python3.6/site-packages/dask/local.py in execute_task()
    270     try:
    271         task, data = loads(task_info)
--> 272         result = _execute_task(task, data)
    273         id = get_id()
    274         result = dumps((result, id))

~/miniconda3/lib/python3.6/site-packages/dask/local.py in _execute_task()
    251         func, args = arg[0], arg[1:]
    252         args2 = [_execute_task(a, cache) for a in args]
--> 253         return func(*args2)
    254     elif not ishashable(arg):
    255         return arg

~/miniconda3/lib/python3.6/site-packages/dask/bag/core.py in reify()
   1547 def reify(seq):
   1548     if isinstance(seq, Iterator):
-> 1549         seq = list(seq)
   1550     if seq and isinstance(seq[0], Iterator):
   1551         seq = list(map(list, seq))

~/miniconda3/lib/python3.6/site-packages/dask/bag/core.py in map_chunk()
   1706                 yield f(**k)
   1707     else:
-> 1708         for a in zip(*args):
   1709             yield f(*a)
   1710 

~/miniconda3/lib/python3.6/site-packages/dask/bag/core.py in map_chunk()
   1706                 yield f(**k)
   1707     else:
-> 1708         for a in zip(*args):
   1709             yield f(*a)
   1710 

~/miniconda3/lib/python3.6/site-packages/dask/bag/core.py in map_chunk()
   1707     else:
   1708         for a in zip(*args):
-> 1709             yield f(*a)
   1710 
   1711     # Check that all iterators are fully exhausted

~/Py/predict-next-purchase/utils.py in load_entityset()
      5 
      6 def load_entityset(data_dir):
----> 7     order_products = pd.read_csv(os.path.join(data_dir, "order_products__prior.csv"))
      8     orders = pd.read_csv(os.path.join(data_dir, "orders.csv"))
      9     departments = pd.read_csv(os.path.join(data_dir, "departments.csv"))

~/miniconda3/lib/python3.6/site-packages/pandas/io/parsers.py in parser_f()
    676                     skip_blank_lines=skip_blank_lines)
    677 
--> 678         return _read(filepath_or_buffer, kwds)
    679 
    680     parser_f.__name__ = name

~/miniconda3/lib/python3.6/site-packages/pandas/io/parsers.py in _read()
    438 
    439     # Create the parser.
--> 440     parser = TextFileReader(filepath_or_buffer, **kwds)
    441 
    442     if chunksize or iterator:

~/miniconda3/lib/python3.6/site-packages/pandas/io/parsers.py in __init__()
    785             self.options['has_index_names'] = kwds['has_index_names']
    786 
--> 787         self._make_engine(self.engine)
    788 
    789     def close(self):

~/miniconda3/lib/python3.6/site-packages/pandas/io/parsers.py in _make_engine()
   1012     def _make_engine(self, engine='c'):
   1013         if engine == 'c':
-> 1014             self._engine = CParserWrapper(self.f, **self.options)
   1015         else:
   1016             if engine == 'python':

~/miniconda3/lib/python3.6/site-packages/pandas/io/parsers.py in __init__()
   1706         kwds['usecols'] = self.usecols
   1707 
-> 1708         self._reader = parsers.TextReader(src, **kwds)
   1709 
   1710         passed_names = self.names is None

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.__cinit__()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._setup_parser_source()

FileNotFoundError: File b'partitioned_data/.DS_Store/order_products__prior.csv' does not exist

Attribute Error

$ python process_data.py
Traceback (most recent call last):
  File "process_data.py", line 1, in <module>
    import featuretools as ft
  File "/home/fh/.local/lib/python2.7/site-packages/featuretools/__init__.py", line 5, in <module>
    from .entityset.api import *
  File "/home/fh/.local/lib/python2.7/site-packages/featuretools/entityset/__init__.py", line 2, in <module>
    from .api import *
  File "/home/fh/.local/lib/python2.7/site-packages/featuretools/entityset/api.py", line 3, in <module>
    from .entityset import EntitySet
  File "/home/fh/.local/lib/python2.7/site-packages/featuretools/entityset/entityset.py", line 5, in <module>
    import dask.dataframe as dd
  File "/home/fh/.local/lib/python2.7/site-packages/dask/dataframe/__init__.py", line 12, in <module>
    from .rolling import (rolling_count, rolling_sum, rolling_mean, rolling_median,
  File "/home/fh/.local/lib/python2.7/site-packages/dask/dataframe/rolling.py", line 200, in <module>
    rolling_count = wrap_rolling(pd.rolling_count, 'count')
AttributeError: 'module' object has no attribute 'rolling_count'

ModuleNotFoundError: No module named 'woodwork.serialize'

Bug report.
Steps:

  • added csv to data folder in root
  • created virtualenv
  • installed requirements
  • opened jupyter notebook
  • ran ~/open_source_demos/predict-next-purchase/Tutorial.ipynb
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[28], line 6
      4 import os
      5 import composeml as cp
----> 6 import featuretools as ft
      7 import dask.dataframe as dd
      8 import numpy as np

File ~/Desktop/featuretools/lib/python3.8/site-packages/featuretools/__init__.py:4
      2 from .config_init import config
      3 from . import variable_types
----> 4 from .entityset.api import *
      5 from . import primitives
      6 from .synthesis.api import *

File ~/Desktop/featuretools/lib/python3.8/site-packages/featuretools/entityset/__init__.py:2
      1 # flake8: noqa
----> 2 from .api import *

File ~/Desktop/featuretools/lib/python3.8/site-packages/featuretools/entityset/api.py:3
      1 # flake8: noqa
      2 from .deserialize import read_entityset
----> 3 from .entityset import EntitySet
      4 from .relationship import Relationship
      5 from .timedelta import Timedelta

File ~/Desktop/featuretools/lib/python3.8/site-packages/featuretools/entityset/entityset.py:12
      9 from woodwork import init_series
     10 from woodwork.logical_types import Datetime
---> 12 from featuretools.entityset import deserialize, serialize
     13 from featuretools.entityset.relationship import Relationship, RelationshipPath
     14 from featuretools.feature_base.feature_base import _ES_REF

File ~/Desktop/featuretools/lib/python3.8/site-packages/featuretools/entityset/serialize.py:7
      4 import tarfile
      5 import tempfile
----> 7 from woodwork.serialize import typing_info_to_dict
      9 from featuretools.utils.gen_utils import import_or_none
     10 from featuretools.utils.s3_utils import get_transport_params, use_smartopen_es

ModuleNotFoundError: No module named 'woodwork.serialize'

create features on one dataset

I have tried to created automated features using only one dataset but it doesnt work. Does it mean I can only use feature tools when I have two or more datasets. The code is as below:

#create entity
es = ft.EntitySet(id = 'clients')

#create entity of the dataset
es = es.entity_from_dataframe(entity_id = 'app', dataframe = data, index ='customerid')

Default primitives from featuretools

default_agg_primitives = ["sum", "std", "max", "skew", "min", "mean", "count", "percent_true", "num_unique", "mode"]
default_trans_primitives = ["day", "year", "month", "weekday", "haversine", "numwords", "characters"]

DFS with specified primitives

feature_matrix, feature_names = ft.dfs(entityset = es, target_entity = 'app',
trans_primitives = default_trans_primitives,
agg_primitives=default_agg_primitives,
max_depth = 2, features_only=False, verbose = True)

print('%d Total Features' % len(feature_names))

This returns same number of features in the dataframe. No new features created

Broken link to data

In the read, the link at "here" is broken

You can download the data directly from Instacart here.

fails to install for windows 10 computer and python 3.5.2

windows 10 computer

Microsoft Windows [Version 10.0.16299.125]
(c) 2017 Microsoft Corporation. All rights reserved.

E:\GBMs xgboost catboost etc\code>python
Python 3.5.2 |Anaconda custom (64-bit)| (default, Jul 5 2016, 11:41:13) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.

Microsoft Windows [Version 10.0.16299.125]
(c) 2017 Microsoft Corporation. All rights reserved.

E:\GBMs xgboost catboost etc\code>pip install featuretools
Collecting featuretools
Downloading featuretools-0.1.17.tar.gz (127kB)
100% |################################| 133kB 850kB/s
Collecting numpy>=1.14.0 (from featuretools)
Downloading numpy-1.14.0-cp35-none-win_amd64.whl (13.4MB)
100% |################################| 13.4MB 89kB/s
Collecting scipy>=1.0.0 (from featuretools)
Downloading scipy-1.0.0-cp35-none-win_amd64.whl (30.8MB)
100% |################################| 30.8MB 39kB/s
Collecting pandas>=0.20.3 (from featuretools)
Downloading pandas-0.22.0-cp35-cp35m-win_amd64.whl (9.0MB)
100% |################################| 9.0MB 135kB/s
Collecting s3fs>=0.1.2 (from featuretools)
Downloading s3fs-0.1.3-py2.py3-none-any.whl
Collecting tqdm>=4.19.2 (from featuretools)
Downloading tqdm-4.19.5-py2.py3-none-any.whl (51kB)
100% |################################| 61kB 2.4MB/s
Collecting toolz>=0.8.2 (from featuretools)
Downloading toolz-0.9.0.tar.gz (45kB)
100% |################################| 51kB 3.5MB/s
Requirement already satisfied: dask[complete] in c:\users\cde3\anaconda3\lib\site-packages (from featuretools)
Requirement already satisfied: pyyaml>=3.12 in c:\users\cde3\anaconda3\lib\site-packages (from featuretools)
Collecting cloudpickle>=0.4.0 (from featuretools)
Downloading cloudpickle-0.5.2-py2.py3-none-any.whl
Collecting future>=0.16.0 (from featuretools)
Downloading future-0.16.0.tar.gz (824kB)
100% |################################| 829kB 1.0MB/s
Collecting pympler>=0.5 (from featuretools)
Downloading Pympler-0.5.tar.gz (170kB)
100% |################################| 174kB 1.1MB/s
Requirement already satisfied: python-dateutil>=2 in c:\users\cde3\anaconda3\lib\site-packages (from pandas>=0.20.3->featuretools)
Requirement already satisfied: pytz>=2011k in c:\users\cde3\anaconda3\lib\site-packages (from pandas>=0.20.3->featuretools)
Collecting boto3 (from s3fs>=0.1.2->featuretools)
Downloading boto3-1.5.24-py2.py3-none-any.whl (128kB)
100% |################################| 133kB 1.5MB/s
Collecting distributed>=1.10 (from dask[complete]->featuretools)
Downloading distributed-1.20.2-py2.py3-none-any.whl (425kB)
100% |################################| 430kB 1.3MB/s
Requirement already satisfied: partd>=0.3.5 in c:\users\cde3\anaconda3\lib\site-packages (from dask[complete]->featuretools)
Requirement already satisfied: six>=1.5 in c:\users\cde3\anaconda3\lib\site-packages (from python-dateutil>=2->pandas>=0.20.3->featuretools)
Collecting botocore<1.9.0,>=1.8.38 (from boto3->s3fs>=0.1.2->featuretools)
Downloading botocore-1.8.38-py2.py3-none-any.whl (4.1MB)
100% |################################| 4.1MB 283kB/s
Collecting s3transfer<0.2.0,>=0.1.10 (from boto3->s3fs>=0.1.2->featuretools)
Downloading s3transfer-0.1.12-py2.py3-none-any.whl (59kB)
100% |################################| 61kB 2.3MB/s
Collecting jmespath<1.0.0,>=0.7.1 (from boto3->s3fs>=0.1.2->featuretools)
Downloading jmespath-0.9.3-py2.py3-none-any.whl
Requirement already satisfied: click>=6.6 in c:\users\cde3\anaconda3\lib\site-packages (from distributed>=1.10->dask[complete]->featuretools)
Collecting msgpack-python (from distributed>=1.10->dask[complete]->featuretools)
Downloading msgpack-python-0.5.4.tar.gz
Collecting zict>=0.1.3 (from distributed>=1.10->dask[complete]->featuretools)
Downloading zict-0.1.3-py2.py3-none-any.whl
Collecting tblib (from distributed>=1.10->dask[complete]->featuretools)
Downloading tblib-1.3.2-py2.py3-none-any.whl
Collecting sortedcontainers (from distributed>=1.10->dask[complete]->featuretools)
Downloading sortedcontainers-1.5.9-py2.py3-none-any.whl
Requirement already satisfied: psutil in c:\users\cde3\anaconda3\lib\site-packages (from distributed>=1.10->dask[complete]->featuretools)
Collecting tornado>=4.5.1 (from distributed>=1.10->dask[complete]->featuretools)
Downloading tornado-4.5.3-cp35-cp35m-win_amd64.whl (423kB)
100% |################################| 430kB 1.3MB/s
Requirement already satisfied: locket in c:\users\cde3\anaconda3\lib\site-packages (from partd>=0.3.5->dask[complete]->featuretools)
Requirement already satisfied: docutils>=0.10 in c:\users\cde3\anaconda3\lib\site-packages (from botocore<1.9.0,>=1.8.38->boto3->s3fs>=0.1.2->featuretools)
Requirement already satisfied: heapdict in c:\users\cde3\anaconda3\lib\site-packages (from zict>=0.1.3->distributed>=1.10->dask[complete]->featuretools)
Building wheels for collected packages: featuretools, toolz, future, pympler, msgpack-python
Running setup.py bdist_wheel for featuretools ... done
Stored in directory: C:\Users\cde3\AppData\Local\pip\Cache\wheels\29\5c\a1\32f45bdcb049077b9d1b9cee4942ba04318b7064446067a442
Running setup.py bdist_wheel for toolz ... done
Stored in directory: C:\Users\cde3\AppData\Local\pip\Cache\wheels\57\51\8a\433a9c0a2c65fc1b2a795ae036b932f3339a02e9ae88367659
Running setup.py bdist_wheel for future ... done
Stored in directory: C:\Users\cde3\AppData\Local\pip\Cache\wheels\c2\50\7c\0d83b4baac4f63ff7a765bd16390d2ab43c93587fac9d6017a
Running setup.py bdist_wheel for pympler ... done
Stored in directory: C:\Users\cde3\AppData\Local\pip\Cache\wheels\57\c6\8b\91d4e0ffb106e935ca145964d13c03619943430ba9344c0206
Running setup.py bdist_wheel for msgpack-python ... error
Complete output from command c:\users\cde3\anaconda3\python.exe -u -c "import setuptools, tokenize;file='C:\Users\cde3\AppData\Local\Temp\pip-build-8krgu5nc\msgpack-python\setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" bdist_wheel -d C:\Users\cde3\AppData\Local\Temp\tmps14z_apepip-wheel- --python-tag cp35:
running bdist_wheel
running build
running build_py
creating build
creating build\lib.win-amd64-3.5
creating build\lib.win-amd64-3.5\msgpack
copying msgpack\exceptions.py -> build\lib.win-amd64-3.5\msgpack
copying msgpack\fallback.py -> build\lib.win-amd64-3.5\msgpack
copying msgpack_version.py -> build\lib.win-amd64-3.5\msgpack
copying msgpack_init_.py -> build\lib.win-amd64-3.5\msgpack
running build_ext
Traceback (most recent call last):
File "", line 1, in
File "C:\Users\cde3\AppData\Local\Temp\pip-build-8krgu5nc\msgpack-python\setup.py", line 136, in
'License :: OSI Approved :: Apache Software License',
File "c:\users\cde3\anaconda3\lib\distutils\core.py", line 148, in setup
dist.run_commands()
File "c:\users\cde3\anaconda3\lib\distutils\dist.py", line 955, in run_commands
self.run_command(cmd)
File "c:\users\cde3\anaconda3\lib\distutils\dist.py", line 974, in run_command
cmd_obj.run()
File "c:\users\cde3\anaconda3\lib\site-packages\wheel\bdist_wheel.py", line 179, in run
self.run_command('build')
File "c:\users\cde3\anaconda3\lib\distutils\cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "c:\users\cde3\anaconda3\lib\distutils\dist.py", line 974, in run_command
cmd_obj.run()
File "c:\users\cde3\anaconda3\lib\distutils\command\build.py", line 135, in run
self.run_command(cmd_name)
File "c:\users\cde3\anaconda3\lib\distutils\cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "c:\users\cde3\anaconda3\lib\distutils\dist.py", line 974, in run_command
cmd_obj.run()
File "c:\users\cde3\anaconda3\lib\distutils\command\build_ext.py", line 307, in run
force=self.force)
File "c:\users\cde3\anaconda3\lib\distutils\ccompiler.py", line 1031, in new_compiler
return klass(None, dry_run, force)
File "c:\users\cde3\anaconda3\lib\distutils\cygwinccompiler.py", line 282, in init
CygwinCCompiler.init (self, verbose, dry_run, force)
File "c:\users\cde3\anaconda3\lib\distutils\cygwinccompiler.py", line 157, in init
self.dll_libraries = get_msvcr()
File "c:\users\cde3\anaconda3\lib\distutils\cygwinccompiler.py", line 86, in get_msvcr
raise ValueError("Unknown MS Compiler version %s " % msc_ver)
ValueError: Unknown MS Compiler version 1900


Failed building wheel for msgpack-python
Running setup.py clean for msgpack-python
Successfully built featuretools toolz future pympler
Failed to build msgpack-python
Installing collected packages: numpy, scipy, pandas, jmespath, botocore, s3transfer, boto3, s3fs, tqdm, toolz, cloudpickle, future, pympler, featuretools, msgpack-python, zict, tblib, sortedcontainers, tornado, distributed
Found existing installation: numpy 1.12.1
Uninstalling numpy-1.12.1:
Successfully uninstalled numpy-1.12.1
Found existing installation: scipy 0.19.0
Uninstalling scipy-0.19.0:
Successfully uninstalled scipy-0.19.0
Found existing installation: pandas 0.18.1
Uninstalling pandas-0.18.1:
Successfully uninstalled pandas-0.18.1
Found existing installation: tqdm 4.11.2
Uninstalling tqdm-4.11.2:
Successfully uninstalled tqdm-4.11.2
Found existing installation: toolz 0.8.0
Uninstalling toolz-0.8.0:
Successfully uninstalled toolz-0.8.0
Found existing installation: cloudpickle 0.2.1
DEPRECATION: Uninstalling a distutils installed project (cloudpickle) has been deprecated and will be removed in a future version. This is due to the fact that uninstalling a distutils project will only partially uninstall the project.
Uninstalling cloudpickle-0.2.1:
Successfully uninstalled cloudpickle-0.2.1
Running setup.py install for msgpack-python ... error
Complete output from command c:\users\cde3\anaconda3\python.exe -u -c "import setuptools, tokenize;file='C:\Users\cde3\AppData\Local\Temp\pip-build-8krgu5nc\msgpack-python\setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record C:\Users\cde3\AppData\Local\Temp\pip-vr6qr69y-record\install-record.txt --single-version-externally-managed --compile:
running install
running build
running build_py
creating build
creating build\lib.win-amd64-3.5
creating build\lib.win-amd64-3.5\msgpack
copying msgpack\exceptions.py -> build\lib.win-amd64-3.5\msgpack
copying msgpack\fallback.py -> build\lib.win-amd64-3.5\msgpack
copying msgpack_version.py -> build\lib.win-amd64-3.5\msgpack
copying msgpack_init_.py -> build\lib.win-amd64-3.5\msgpack
running build_ext
Traceback (most recent call last):
File "", line 1, in
File "C:\Users\cde3\AppData\Local\Temp\pip-build-8krgu5nc\msgpack-python\setup.py", line 136, in
'License :: OSI Approved :: Apache Software License',
File "c:\users\cde3\anaconda3\lib\distutils\core.py", line 148, in setup
dist.run_commands()
File "c:\users\cde3\anaconda3\lib\distutils\dist.py", line 955, in run_commands
self.run_command(cmd)
File "c:\users\cde3\anaconda3\lib\distutils\dist.py", line 974, in run_command
cmd_obj.run()
File "c:\users\cde3\anaconda3\lib\site-packages\setuptools-27.2.0-py3.5.egg\setuptools\command\install.py", line 61, in run
File "c:\users\cde3\anaconda3\lib\distutils\command\install.py", line 539, in run
self.run_command('build')
File "c:\users\cde3\anaconda3\lib\distutils\cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "c:\users\cde3\anaconda3\lib\distutils\dist.py", line 974, in run_command
cmd_obj.run()
File "c:\users\cde3\anaconda3\lib\distutils\command\build.py", line 135, in run
self.run_command(cmd_name)
File "c:\users\cde3\anaconda3\lib\distutils\cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "c:\users\cde3\anaconda3\lib\distutils\dist.py", line 974, in run_command
cmd_obj.run()
File "c:\users\cde3\anaconda3\lib\distutils\command\build_ext.py", line 307, in run
force=self.force)
File "c:\users\cde3\anaconda3\lib\distutils\ccompiler.py", line 1031, in new_compiler
return klass(None, dry_run, force)
File "c:\users\cde3\anaconda3\lib\distutils\cygwinccompiler.py", line 282, in init
CygwinCCompiler.init (self, verbose, dry_run, force)
File "c:\users\cde3\anaconda3\lib\distutils\cygwinccompiler.py", line 157, in init
self.dll_libraries = get_msvcr()
File "c:\users\cde3\anaconda3\lib\distutils\cygwinccompiler.py", line 86, in get_msvcr
raise ValueError("Unknown MS Compiler version %s " % msc_ver)
ValueError: Unknown MS Compiler version 1900

----------------------------------------

Command "c:\users\cde3\anaconda3\python.exe -u -c "import setuptools, tokenize;file='C:\Users\cde3\AppData\Local\Temp\pip-build-8krgu5nc\msgpack-python\setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record C:\Users\cde3\AppData\Local\Temp\pip-vr6qr69y-record\install-record.txt --single-version-externally-managed --compile" failed with error code 1 in C:\Users\cde3\AppData\Local\Temp\pip-build-8krgu5nc\msgpack-python\

E:\GBMs xgboost catboost etc\code>

where_primitives that are also not specified under agg_primitives don't get used and hence result in warnings.warn(warning_msg, UnusedPrimitiveWarning)

featuretools==0.25.0

I am using official prediction of customer churn example from here

For quick experimentation, I have subsetted the cutoff_times to include only two msno (IDs). Like so:

cutoff_times_=cutoff_times.iloc[[33,34,21,22],:].reset_index(drop=True)

cutoff_times_ = cutoff_times_.rename(columns={'cutoff_time':'time'})

Then in cell 20, I notice I don't get where clause features made for all set(where_primitives) - set(agg_primitives) where primitives. I also get warnings.warn(warning_msg, UnusedPrimitiveWarning) for all the primitives that are there in the where_primitives list but not in the agg_primitives list.

Attaching a few examples (I have changed the max_depth to 10 to make sure that insufficient depth is not the cause):
1.

feature_defs,_ = ft.dfs(entityset=es, target_entity='members',
                      agg_primitives = [],
                      trans_primitives = ['month'],
                        cutoff_time_in_index = True,
                      cutoff_time = cutoff_times_,
                      where_primitives = ['max'],
                      max_depth=10, features_only=False)

output: 
/home/nitin/miniconda3/envs/featuretools/lib/python3.9/site-packages/featuretools/synthesis/dfs.py:307: UnusedPrimitiveWarning: Some specified primitives were not used during DFS:
  where_primitives: ['max']
This may be caused by a using a value of max_depth that is too small, not setting interesting values, or it may indicate no compatible variable types for the primitive were found in the data.
  warnings.warn(warning_msg, UnusedPrimitiveWarning)
feature_defs,_ = ft.dfs(entityset=es, target_entity='members',
                      agg_primitives = ['sum'],
                      trans_primitives = ['month'],
                        cutoff_time_in_index = True,
                      cutoff_time = cutoff_times_,
                      where_primitives = ['max','min'],
                      max_depth=10, features_only=False)
output:
/home/nitin/miniconda3/envs/featuretools/lib/python3.9/site-packages/featuretools/synthesis/dfs.py:307: UnusedPrimitiveWarning: Some specified primitives were not used during DFS:
  where_primitives: ['max', 'min']
This may be caused by a using a value of max_depth that is too small, not setting interesting values, or it may indicate no compatible variable types for the primitive were found in the data.
  warnings.warn(warning_msg, UnusedPrimitiveWarning)
feature_defs,_ = ft.dfs(entityset=es, target_entity='members',
                      agg_primitives = ['sum','min'],
                      trans_primitives = ['month'],
                        cutoff_time_in_index = True,
                      cutoff_time = cutoff_times_,
                      where_primitives = ['max','min'],
                      max_depth=10, features_only=False)

output:
/home/nitin/miniconda3/envs/featuretools/lib/python3.9/site-packages/featuretools/synthesis/dfs.py:307: UnusedPrimitiveWarning: Some specified primitives were not used during DFS:
  where_primitives: ['max']
This may be caused by a using a value of max_depth that is too small, not setting interesting values, or it may indicate no compatible variable types for the primitive were found in the data.
  warnings.warn(warning_msg, UnusedPrimitiveWarning)

feature_defs,_ = ft.dfs(entityset=es, target_entity='members',
                      agg_primitives = ['sum','min','max'],
                      trans_primitives = ['month'],
                        cutoff_time_in_index = True,
                      cutoff_time = cutoff_times_,
                      where_primitives = ['max','min','sum','std'],
                      max_depth=10, features_only=False)
output:
/home/nitin/miniconda3/envs/featuretools/lib/python3.9/site-packages/featuretools/synthesis/dfs.py:307: UnusedPrimitiveWarning: Some specified primitives were not used during DFS:
  where_primitives: ['std']
This may be caused by a using a value of max_depth that is too small, not setting interesting values, or it may indicate no compatible variable types for the primitive were found in the data.
  warnings.warn(warning_msg, UnusedPrimitiveWarning)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.