Giter Club home page Giter Club logo

Comments (9)

jmrichardson avatar jmrichardson commented on May 28, 2024

Got past the above error by installing: conda install libpython

But now getting this error:


[AutoFeatRegression] The 3 step feature engineering process could generate up to 2864745 features.
[AutoFeatRegression] With 48573 data points this new feature matrix would use about 556.60 gb of space.
Step 1: transformation of original features
              0/             83Traceback (most recent call last):
  File "D:\Anaconda3\envs\quant\lib\site-packages\sympy\utilities\autowrap.py", line 168, in _process_files
    retoutput = check_output(command, stderr=STDOUT)
  File "D:\Anaconda3\envs\quant\lib\subprocess.py", line 356, in check_output
    **kwargs).stdout
  File "D:\Anaconda3\envs\quant\lib\subprocess.py", line 438, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['D:\\Anaconda3\\envs\\quant\\python.exe', 'setup.py', 'build_ext', '--inplace']' returned non-zero exit status 1.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "D:\Anaconda3\envs\quant\lib\site-packages\IPython\core\interactiveshell.py", line 3291, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-6-2e784209153e>", line 5, in <module>
    df = model.fit_transform(X, y)
  File "D:\Anaconda3\envs\quant\lib\site-packages\autofeat\autofeat.py", line 249, in fit_transform
    self.feateng_steps, self.transformations)
  File "D:\Anaconda3\envs\quant\lib\site-packages\autofeat\feateng.py", line 249, in generate_features
    original_features.extend(apply_tranformations(original_features))
  File "D:\Anaconda3\envs\quant\lib\site-packages\autofeat\feateng.py", line 194, in apply_tranformations
    f = ufuncify(t, expr_temp)
  File "D:\Anaconda3\envs\quant\lib\site-packages\sympy\core\cache.py", line 94, in wrapper
    retval = cfunc(*args, **kwargs)
  File "D:\Anaconda3\envs\quant\lib\site-packages\sympy\utilities\autowrap.py", line 1105, in ufuncify
    return code_wrapper.wrap_code(routines, helpers=helps)
  File "D:\Anaconda3\envs\quant\lib\site-packages\sympy\utilities\autowrap.py", line 828, in wrap_code
    self._process_files(routines)
  File "D:\Anaconda3\envs\quant\lib\site-packages\sympy\utilities\autowrap.py", line 172, in _process_files
    " ".join(command), e.output.decode('utf-8')))
sympy.utilities.autowrap.CodeWrapError: Error while executing command: D:\Anaconda3\envs\quant\python.exe setup.py build_ext --inplace. Command output is:
running build_ext
running build_src
build_src
building extension "wrapper_module_1" sources
build_src: building npy-pkg config files
Traceback (most recent call last):
  File "setup.py", line 14, in <module>
    setup(configuration=configuration)
  File "D:\Anaconda3\envs\quant\lib\site-packages\numpy\distutils\core.py", line 171, in setup
    return old_setup(**new_attr)
  File "D:\Anaconda3\envs\quant\lib\distutils\core.py", line 148, in setup
    dist.run_commands()
  File "D:\Anaconda3\envs\quant\lib\distutils\dist.py", line 955, in run_commands
    self.run_command(cmd)
  File "D:\Anaconda3\envs\quant\lib\distutils\dist.py", line 974, in run_command
    cmd_obj.run()
  File "D:\Anaconda3\envs\quant\lib\site-packages\numpy\distutils\command\build_ext.py", line 116, in run
    force=self.force)
  File "D:\Anaconda3\envs\quant\lib\site-packages\numpy\distutils\ccompiler.py", line 765, in new_compiler
    compiler = klass(None, dry_run, force)
  File "D:\Anaconda3\envs\quant\lib\site-packages\numpy\distutils\mingw32ccompiler.py", line 69, in __init__
    dry_run, force)
  File "D:\Anaconda3\envs\quant\lib\distutils\cygwinccompiler.py", line 129, in __init__
    if self.ld_version >= "2.10.90":
TypeError: '>=' not supported between instances of 'NoneType' and 'str'

from autofeat.

jmrichardson avatar jmrichardson commented on May 28, 2024

Got past the issues above by installing MVS 2019. However, now I am running into this problem (Running the example notebook):

[AutoFeatRegression] 7 features occurred in more than one featsel run.
[AutoFeatRegression] 4 new features selected.
[AutoFeatRegression] Computing 4 new features.
[AutoFeatRegression] Error while processing expression: '1/(-x.2 + 1/x/3)'
Traceback (most recent call last):
  File "D:\Anaconda3\envs\quant\lib\site-packages\sympy\utilities\autowrap.py", line 168, in _process_files
    retoutput = check_output(command, stderr=STDOUT)
  File "D:\Anaconda3\envs\quant\lib\subprocess.py", line 356, in check_output
    **kwargs).stdout
  File "D:\Anaconda3\envs\quant\lib\subprocess.py", line 438, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['D:\\Anaconda3\\envs\\quant\\python.exe', 'setup.py', 'build_ext', '--inplace']' returned non-zero exit status 1.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "D:\Anaconda3\envs\quant\lib\site-packages\IPython\core\interactiveshell.py", line 3291, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-5-d7f03eebbf94>", line 19, in <module>
    df = afreg.fit_transform(pd.DataFrame(X, columns=["x 1", "x.2", "x/3"]), target)
  File "D:\Anaconda3\envs\quant\lib\site-packages\autofeat\autofeat.py", line 287, in fit_transform
    df = self._generate_features(df, good_cols)
  File "D:\Anaconda3\envs\quant\lib\site-packages\autofeat\autofeat.py", line 172, in _generate_features
    f = ufuncify((self.feature_formulas[c] for c in cols), self.feature_formulas[expr])
  File "D:\Anaconda3\envs\quant\lib\site-packages\sympy\core\cache.py", line 94, in wrapper
    retval = cfunc(*args, **kwargs)
  File "D:\Anaconda3\envs\quant\lib\site-packages\sympy\utilities\autowrap.py", line 1105, in ufuncify
    return code_wrapper.wrap_code(routines, helpers=helps)
  File "D:\Anaconda3\envs\quant\lib\site-packages\sympy\utilities\autowrap.py", line 828, in wrap_code
    self._process_files(routines)
  File "D:\Anaconda3\envs\quant\lib\site-packages\sympy\utilities\autowrap.py", line 172, in _process_files
    " ".join(command), e.output.decode('utf-8')))
sympy.utilities.autowrap.CodeWrapError: Error while executing command: D:\Anaconda3\envs\quant\python.exe setup.py build_ext --inplace. Command output is:
running build_ext
running build_src
build_src
building extension "wrapper_module_11" sources
build_src: building npy-pkg config files
No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
customize MSVCCompiler
customize MSVCCompiler using build_ext
building 'wrapper_module_11' extension
compiling C sources
C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.20.27508\bin\HostX86\x64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -ID:\Anaconda3\envs\quant\lib\site-packages\numpy\core\include -ID:\Anaconda3\envs\quant\include -ID:\Anaconda3\envs\quant\include -IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.20.27508\include -IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\ucrt -IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\shared -IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\um -IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\winrt -IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\cppwinrt /Tcwrapper_module_11.c /Fobuild\temp.win-amd64-3.6\Release\wrapper_module_11.obj
wrapper_module_11.c
D:\Anaconda3\envs\quant\lib\site-packages\numpy\core\include\numpy\npy_1_7_deprecated_api.h(14) : Warning Msg: Using deprecated NumPy API, disable it with #define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION
C:\Users\john\AppData\Local\Temp\tmpkan8rgui_sympy_compile\wrapped_code_11.h(3): error C2143: syntax error: missing ')' before 'constant'
C:\Users\john\AppData\Local\Temp\tmpkan8rgui_sympy_compile\wrapped_code_11.h(3): error C2143: syntax error: missing '{' before 'constant'
C:\Users\john\AppData\Local\Temp\tmpkan8rgui_sympy_compile\wrapped_code_11.h(3): error C2059: syntax error: 'constant'
C:\Users\john\AppData\Local\Temp\tmpkan8rgui_sympy_compile\wrapped_code_11.h(3): error C2059: syntax error: ')'
wrapper_module_11.c(23): warning C4013: 'autofunc0' undefined; assuming extern returning int
error: Command "C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.20.27508\bin\HostX86\x64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -ID:\Anaconda3\envs\quant\lib\site-packages\numpy\core\include -ID:\Anaconda3\envs\quant\include -ID:\Anaconda3\envs\quant\include -IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.20.27508\include -IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\ucrt -IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\shared -IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\um -IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\winrt -IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\cppwinrt /Tcwrapper_module_11.c /Fobuild\temp.win-amd64-3.6\Release\wrapper_module_11.obj" failed with exit status 2

from autofeat.

cod3licious avatar cod3licious commented on May 28, 2024

Sorry for all the trouble! This is due to a sympy bug: sympy/sympy#16371 but I'm working on an alternative and in 1-2 days will push a new version of the code that should work without this (and also include some other bug fixes and stability improvements). I'll post again once it's out and then this should hopefully fix this! :)

from autofeat.

jmrichardson avatar jmrichardson commented on May 28, 2024

No problem and thanks for sharing. I also tried running the code on my dataset and ran into the below error. I do have null values in my data. Is that a problem? Also, my DF has 83 features and ~40K rows. Is there a way to control how many features are generated? I am thinking I am going to run out of resources...

from autofeat import AutoFeatRegression
model = AutoFeatRegression()
df = model.fit_transform(X, y)
[AutoFeatRegression] The 3 step feature engineering process could generate up to 2864745 features.
[AutoFeatRegression] With 48573 data points this new feature matrix would use about 556.60 gb of space.
Step 1: transformation of original features
              0/             83Traceback (most recent call last):
  File "D:\Anaconda3\envs\quant\lib\site-packages\sympy\core\cache.py", line 94, in wrapper
    retval = cfunc(*args, **kwargs)
  File "D:\Anaconda3\envs\quant\lib\site-packages\sympy\core\function.py", line 277, in __new__
    evaluated = cls.eval(*args)
  File "D:\Anaconda3\envs\quant\lib\site-packages\sympy\functions\elementary\complexes.py", line 454, in eval
    raise TypeError("Bad argument type for Abs(): %s" % type(arg))
TypeError: Bad argument type for Abs(): <class 'sympy.core.containers.Tuple'>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "D:\Anaconda3\envs\quant\lib\site-packages\sympy\core\cache.py", line 94, in wrapper
    retval = cfunc(*args, **kwargs)

from autofeat.

cod3licious avatar cod3licious commented on May 28, 2024

yes, the data should only contain finite values.
to limit the memory you can do several things:

  • by specifying feateng_steps, you can limit the number of transformations and combinations that are being performed.
  • by specifying feateng_cols, you can say which of the features should be used for the feature engineering, in case you already know that some features might not really be of interest
  • by specifying transformations, you can limit the kinds of transformations that are being performed in the feature engineering part; maybe some don't really make sense for your data?
  • last but not least: just pass fewer rows to the model when fitting it; you can always call transform on the whole dataset after the fitting to just generate the few selected feature for the whole dataset and then use this to train your own model. this will save lots of space. the only thing where you have to be careful is that you have to make sure that the data points that you do pass contain a representative range of the values each variable can take: some of the features are only computed e.g. if all data points are > 0 for a specific variable (e.g. log); if this is the case during training, but then later you transform new data with values outside of these ranges, this will result in an error.

from autofeat.

jmrichardson avatar jmrichardson commented on May 28, 2024

That sounds great! I am curious if you are planning on dealing with missing values in a future release? I am currently using featuretools and tsfresh, which do handle NaNs. I could impute but with my data set, it would introduce bias.

from autofeat.

cod3licious avatar cod3licious commented on May 28, 2024

I can have a look at it, but since I'm relying on sklearn models, especially for the feature selection part, and they can't handle NaNs, at least for the fitting part, NaNs would need to be replaced/ignored. I could change the transformation part though to only work on the non-NaNs and leave the NaNs in place, but you still couldn't use predict with the underlying sklearn model

from autofeat.

jmrichardson avatar jmrichardson commented on May 28, 2024

I see.. that makes sense. Thanks and appreciate the response

from autofeat.

cod3licious avatar cod3licious commented on May 28, 2024

@jmrichardson alright, version 0.2 is out! This should fix your original error and additionally allows for NaNs in the transform function (in fit this is not possible due to the dependence on the sklearn model, but I'm sure in this huge dataset you have you'll find a few rows that don't contain NaNs anywhere). I'm closing this issue now - let me know if there are any other problems! :)

from autofeat.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.