Comments (9)
Got past the above error by installing: conda install libpython
But now getting this error:
[AutoFeatRegression] The 3 step feature engineering process could generate up to 2864745 features.
[AutoFeatRegression] With 48573 data points this new feature matrix would use about 556.60 gb of space.
Step 1: transformation of original features
0/ 83Traceback (most recent call last):
File "D:\Anaconda3\envs\quant\lib\site-packages\sympy\utilities\autowrap.py", line 168, in _process_files
retoutput = check_output(command, stderr=STDOUT)
File "D:\Anaconda3\envs\quant\lib\subprocess.py", line 356, in check_output
**kwargs).stdout
File "D:\Anaconda3\envs\quant\lib\subprocess.py", line 438, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['D:\\Anaconda3\\envs\\quant\\python.exe', 'setup.py', 'build_ext', '--inplace']' returned non-zero exit status 1.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "D:\Anaconda3\envs\quant\lib\site-packages\IPython\core\interactiveshell.py", line 3291, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-6-2e784209153e>", line 5, in <module>
df = model.fit_transform(X, y)
File "D:\Anaconda3\envs\quant\lib\site-packages\autofeat\autofeat.py", line 249, in fit_transform
self.feateng_steps, self.transformations)
File "D:\Anaconda3\envs\quant\lib\site-packages\autofeat\feateng.py", line 249, in generate_features
original_features.extend(apply_tranformations(original_features))
File "D:\Anaconda3\envs\quant\lib\site-packages\autofeat\feateng.py", line 194, in apply_tranformations
f = ufuncify(t, expr_temp)
File "D:\Anaconda3\envs\quant\lib\site-packages\sympy\core\cache.py", line 94, in wrapper
retval = cfunc(*args, **kwargs)
File "D:\Anaconda3\envs\quant\lib\site-packages\sympy\utilities\autowrap.py", line 1105, in ufuncify
return code_wrapper.wrap_code(routines, helpers=helps)
File "D:\Anaconda3\envs\quant\lib\site-packages\sympy\utilities\autowrap.py", line 828, in wrap_code
self._process_files(routines)
File "D:\Anaconda3\envs\quant\lib\site-packages\sympy\utilities\autowrap.py", line 172, in _process_files
" ".join(command), e.output.decode('utf-8')))
sympy.utilities.autowrap.CodeWrapError: Error while executing command: D:\Anaconda3\envs\quant\python.exe setup.py build_ext --inplace. Command output is:
running build_ext
running build_src
build_src
building extension "wrapper_module_1" sources
build_src: building npy-pkg config files
Traceback (most recent call last):
File "setup.py", line 14, in <module>
setup(configuration=configuration)
File "D:\Anaconda3\envs\quant\lib\site-packages\numpy\distutils\core.py", line 171, in setup
return old_setup(**new_attr)
File "D:\Anaconda3\envs\quant\lib\distutils\core.py", line 148, in setup
dist.run_commands()
File "D:\Anaconda3\envs\quant\lib\distutils\dist.py", line 955, in run_commands
self.run_command(cmd)
File "D:\Anaconda3\envs\quant\lib\distutils\dist.py", line 974, in run_command
cmd_obj.run()
File "D:\Anaconda3\envs\quant\lib\site-packages\numpy\distutils\command\build_ext.py", line 116, in run
force=self.force)
File "D:\Anaconda3\envs\quant\lib\site-packages\numpy\distutils\ccompiler.py", line 765, in new_compiler
compiler = klass(None, dry_run, force)
File "D:\Anaconda3\envs\quant\lib\site-packages\numpy\distutils\mingw32ccompiler.py", line 69, in __init__
dry_run, force)
File "D:\Anaconda3\envs\quant\lib\distutils\cygwinccompiler.py", line 129, in __init__
if self.ld_version >= "2.10.90":
TypeError: '>=' not supported between instances of 'NoneType' and 'str'
from autofeat.
Got past the issues above by installing MVS 2019. However, now I am running into this problem (Running the example notebook):
[AutoFeatRegression] 7 features occurred in more than one featsel run.
[AutoFeatRegression] 4 new features selected.
[AutoFeatRegression] Computing 4 new features.
[AutoFeatRegression] Error while processing expression: '1/(-x.2 + 1/x/3)'
Traceback (most recent call last):
File "D:\Anaconda3\envs\quant\lib\site-packages\sympy\utilities\autowrap.py", line 168, in _process_files
retoutput = check_output(command, stderr=STDOUT)
File "D:\Anaconda3\envs\quant\lib\subprocess.py", line 356, in check_output
**kwargs).stdout
File "D:\Anaconda3\envs\quant\lib\subprocess.py", line 438, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['D:\\Anaconda3\\envs\\quant\\python.exe', 'setup.py', 'build_ext', '--inplace']' returned non-zero exit status 1.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "D:\Anaconda3\envs\quant\lib\site-packages\IPython\core\interactiveshell.py", line 3291, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-5-d7f03eebbf94>", line 19, in <module>
df = afreg.fit_transform(pd.DataFrame(X, columns=["x 1", "x.2", "x/3"]), target)
File "D:\Anaconda3\envs\quant\lib\site-packages\autofeat\autofeat.py", line 287, in fit_transform
df = self._generate_features(df, good_cols)
File "D:\Anaconda3\envs\quant\lib\site-packages\autofeat\autofeat.py", line 172, in _generate_features
f = ufuncify((self.feature_formulas[c] for c in cols), self.feature_formulas[expr])
File "D:\Anaconda3\envs\quant\lib\site-packages\sympy\core\cache.py", line 94, in wrapper
retval = cfunc(*args, **kwargs)
File "D:\Anaconda3\envs\quant\lib\site-packages\sympy\utilities\autowrap.py", line 1105, in ufuncify
return code_wrapper.wrap_code(routines, helpers=helps)
File "D:\Anaconda3\envs\quant\lib\site-packages\sympy\utilities\autowrap.py", line 828, in wrap_code
self._process_files(routines)
File "D:\Anaconda3\envs\quant\lib\site-packages\sympy\utilities\autowrap.py", line 172, in _process_files
" ".join(command), e.output.decode('utf-8')))
sympy.utilities.autowrap.CodeWrapError: Error while executing command: D:\Anaconda3\envs\quant\python.exe setup.py build_ext --inplace. Command output is:
running build_ext
running build_src
build_src
building extension "wrapper_module_11" sources
build_src: building npy-pkg config files
No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
customize MSVCCompiler
customize MSVCCompiler using build_ext
building 'wrapper_module_11' extension
compiling C sources
C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.20.27508\bin\HostX86\x64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -ID:\Anaconda3\envs\quant\lib\site-packages\numpy\core\include -ID:\Anaconda3\envs\quant\include -ID:\Anaconda3\envs\quant\include -IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.20.27508\include -IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\ucrt -IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\shared -IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\um -IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\winrt -IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\cppwinrt /Tcwrapper_module_11.c /Fobuild\temp.win-amd64-3.6\Release\wrapper_module_11.obj
wrapper_module_11.c
D:\Anaconda3\envs\quant\lib\site-packages\numpy\core\include\numpy\npy_1_7_deprecated_api.h(14) : Warning Msg: Using deprecated NumPy API, disable it with #define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION
C:\Users\john\AppData\Local\Temp\tmpkan8rgui_sympy_compile\wrapped_code_11.h(3): error C2143: syntax error: missing ')' before 'constant'
C:\Users\john\AppData\Local\Temp\tmpkan8rgui_sympy_compile\wrapped_code_11.h(3): error C2143: syntax error: missing '{' before 'constant'
C:\Users\john\AppData\Local\Temp\tmpkan8rgui_sympy_compile\wrapped_code_11.h(3): error C2059: syntax error: 'constant'
C:\Users\john\AppData\Local\Temp\tmpkan8rgui_sympy_compile\wrapped_code_11.h(3): error C2059: syntax error: ')'
wrapper_module_11.c(23): warning C4013: 'autofunc0' undefined; assuming extern returning int
error: Command "C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.20.27508\bin\HostX86\x64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -ID:\Anaconda3\envs\quant\lib\site-packages\numpy\core\include -ID:\Anaconda3\envs\quant\include -ID:\Anaconda3\envs\quant\include -IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.20.27508\include -IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\ucrt -IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\shared -IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\um -IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\winrt -IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\cppwinrt /Tcwrapper_module_11.c /Fobuild\temp.win-amd64-3.6\Release\wrapper_module_11.obj" failed with exit status 2
from autofeat.
Sorry for all the trouble! This is due to a sympy bug: sympy/sympy#16371 but I'm working on an alternative and in 1-2 days will push a new version of the code that should work without this (and also include some other bug fixes and stability improvements). I'll post again once it's out and then this should hopefully fix this! :)
from autofeat.
No problem and thanks for sharing. I also tried running the code on my dataset and ran into the below error. I do have null values in my data. Is that a problem? Also, my DF has 83 features and ~40K rows. Is there a way to control how many features are generated? I am thinking I am going to run out of resources...
from autofeat import AutoFeatRegression
model = AutoFeatRegression()
df = model.fit_transform(X, y)
[AutoFeatRegression] The 3 step feature engineering process could generate up to 2864745 features.
[AutoFeatRegression] With 48573 data points this new feature matrix would use about 556.60 gb of space.
Step 1: transformation of original features
0/ 83Traceback (most recent call last):
File "D:\Anaconda3\envs\quant\lib\site-packages\sympy\core\cache.py", line 94, in wrapper
retval = cfunc(*args, **kwargs)
File "D:\Anaconda3\envs\quant\lib\site-packages\sympy\core\function.py", line 277, in __new__
evaluated = cls.eval(*args)
File "D:\Anaconda3\envs\quant\lib\site-packages\sympy\functions\elementary\complexes.py", line 454, in eval
raise TypeError("Bad argument type for Abs(): %s" % type(arg))
TypeError: Bad argument type for Abs(): <class 'sympy.core.containers.Tuple'>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "D:\Anaconda3\envs\quant\lib\site-packages\sympy\core\cache.py", line 94, in wrapper
retval = cfunc(*args, **kwargs)
from autofeat.
yes, the data should only contain finite values.
to limit the memory you can do several things:
- by specifying
feateng_steps
, you can limit the number of transformations and combinations that are being performed. - by specifying
feateng_cols
, you can say which of the features should be used for the feature engineering, in case you already know that some features might not really be of interest - by specifying
transformations
, you can limit the kinds of transformations that are being performed in the feature engineering part; maybe some don't really make sense for your data? - last but not least: just pass fewer rows to the model when fitting it; you can always call
transform
on the whole dataset after the fitting to just generate the few selected feature for the whole dataset and then use this to train your own model. this will save lots of space. the only thing where you have to be careful is that you have to make sure that the data points that you do pass contain a representative range of the values each variable can take: some of the features are only computed e.g. if all data points are > 0 for a specific variable (e.g. log); if this is the case during training, but then later you transform new data with values outside of these ranges, this will result in an error.
from autofeat.
That sounds great! I am curious if you are planning on dealing with missing values in a future release? I am currently using featuretools and tsfresh, which do handle NaNs. I could impute but with my data set, it would introduce bias.
from autofeat.
I can have a look at it, but since I'm relying on sklearn models, especially for the feature selection part, and they can't handle NaNs, at least for the fitting part, NaNs would need to be replaced/ignored. I could change the transformation part though to only work on the non-NaNs and leave the NaNs in place, but you still couldn't use predict with the underlying sklearn model
from autofeat.
I see.. that makes sense. Thanks and appreciate the response
from autofeat.
@jmrichardson alright, version 0.2 is out! This should fix your original error and additionally allows for NaNs in the transform function (in fit this is not possible due to the dependence on the sklearn model, but I'm sure in this huge dataset you have you'll find a few rows that don't contain NaNs anywhere). I'm closing this issue now - let me know if there are any other problems! :)
from autofeat.
Related Issues (20)
- Data validation error when using Buckingham's Pi Theorem on Classification task HOT 1
- Is it possible to use autofeat without exceeding memory of the system? HOT 3
- possible point for verification HOT 3
- How to transform new data? HOT 1
- Speed up tranform() HOT 6
- MemoryError: Unable to allocate 2.05 GiB for an array with shape (501, 550174) and data type float64
- pandas corr is too slow; use numpy instead HOT 1
- Correlation matrix can have inconsistent column and row names HOT 1
- Allow user to pass dict of Pint objects/ureg
- Input contains NaN, infinity or a value too large for dtype('float32') on fit_transform HOT 2
- ufunc '_lambdifygenerated' did not contain a loop with signature matching types (<class 'numpy.dtype[float32]'>, <class 'numpy.dtype[float32]'>) -> None HOT 6
- How to choose sin(x) and cos(x) etl. as features? HOT 1
- Scaling and Autofeat HOT 2
- [enhancement] add predict_proba for classifiers HOT 11
- TypeError: unsupported operand type(s) for |: 'type' and 'NoneType' HOT 1
- ValueError: Input X contains NaN. HOT 6
- Documentation Enhancement for getting model, features, coefficients HOT 1
- AutoFeatLight.fit_trasnform() take 2 positional arguments 3 given HOT 2
- Reproducibility issue HOT 1
- In toydata, autofeat finds the correct function (square) only under some circumstances HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from autofeat.