heal-research / pyoperon Goto Github PK

View Code? Open in Web Editor NEW

34.0 8.0 9.0 833 KB

Python bindings and scikit-learn interface for the Operon library for symbolic regression.

License: MIT License

CMake 9.67% C++ 38.02% Python 21.66% Nix 3.23% Jupyter Notebook 24.57% Shell 2.85%

genetic-programming machine-learning parallel python sklearn-compatible symbolic-regression

pyoperon's People

Contributors

Stargazers

Watchers

Forkers

pakamienny lkampoli mehdi-aiyad yskre munozariasjm joaocarvalhoopen jose-d yjcsivsuk tty2-tony

pyoperon's Issues

operon-sklearn.py example not working - segfault in MinimumDescriptionLengthEvaluator

Hello,

After successful installation, running examples that use sklearn results in a segmentation fault. After some debugging I narrowed the error to running the MinimumDescriptionLengthEvaluator on the results to obtain the best model. If I comment out that part in sklearn.py and just obtain the pareto front, or use BIC or AIC, everything works fine. Other examples also work perfectly.

This error has been reproduced on several different machines. Have you ever encoutered it?
Thank you very much for any comments,
Best,
David

Calling the optimizer is unnecessarily complicated

Right now, optimizing a tree with pyoperon requires a lot of ugly code:

def evaluate_with_pyoperon(pdata, tree, range_train, range_test):
    a, b = range_train
    c, d = range_test

    # pyoperon
    pyop_dataset  = op.Dataset(pdata.values)
    pyop_dataset.VariableNames = pdata.columns
    pyop_range_tr = op.Range(a, b)
    pyop_range_te = op.Range(c, d)
    pyop_vars     = sorted(pyop_dataset.Variables, key=lambda v: v.Index)
    pyop_hashes   = [v.Hash for v in pyop_vars[:-1]]
    pyop_target   = pyop_vars[-1]
    pyop_problem  = op.Problem(pyop_dataset, pyop_range_tr, pyop_range_te)
    pyop_problem.InputHashes = pyop_hashes
    pyop_problem.Target = pyop_target
    pyop_dt       = op.DispatchTable()
    pyop_opt      = op.LMOptimizer(pyop_dt, pyop_problem, max_iter=20)
    rng = op.RomuTrio(np.random.randint(1, 1_000_000))

    summary = pyop_opt.Optimize(rng, tree)

    if summary.Success:
        pyop_tree.SetCoefficients(summary.FinalParameters)

    range_full = op.Range(0, pyop_dataset.Rows)
    return op.Evaluate(pyop_dt, pyop_tree, pyop_dataset, range_full)

This should not be so complicated. At the very least, if pdata is a dataframe, we should hide the construction of the dataset and problem and offer a simplified API.

Segmentation fault when working with MultiEvaluator on Mac M1

Hi,

I've been able to isolate a segmentation fault to the use of Operon.MultiEvaluator, specifically when you add more than one evaluator to the MultiEvaluator object.

Simple reproducible example (replacing the Operon.Evaluator() definition in example/operon-bindings.py ):

evaluator = Operon.MultiEvaluator(problem)
for i in range(2): # works fine if changed to range(1)
    evaluator_i      = Operon.Evaluator(problem, dtable, error_metric, True) # initialize evaluator, use linear scaling = True
    evaluator_i.Budget = 1000 * 1000             # computational budget
    optimizer      = Operon.LMOptimizer(dtable, problem, max_iter=3)
    evaluator_i.Optimizer = optimizer
    evaluator.Add(evaluator_i)

aggregateEvaluator = Operon.AggregateEvaluator(evaluator)
aggregateEvaluator.AggregateType = Operon.AggregateType.Max

# define how new offspring are created
generator      = Operon.BasicOffspringGenerator(aggregateEvaluator, crossover, mutation, selector, selector)

Not sure if this bug appears on Linux machines as well. Segfault still occurs when not applying aggregateEvaluator and instead feeding evaluator directly into the last line, so the issue does not seem to be with AggregateEvaluator. Using Python 3.11, and working on a MacBook M1 Pro with Sonoma 14.5 Beta. Installed using git clone + pip instructions.

ModuleNotFoundError: No module named 'operon.pyoperon'

Hi @foolnotion,

I am trying to run the srbench suite for a paper I am writing on my code PySR, but I am having difficulty setting up operon. I tried the conda build script in srbench for operon at first but couldn't get it to work due to various issues when building eve - it seems like it is not able to interpret the float_ and other custom category::{type} types meant, without them explicitly being labeled as category::float_, etc. Not sure why this error occurs.

I decided it would be too difficult to manually fix those issues, so yesterday I decided instead to try out nix using nix-portable since this looks like the recommended approach for building operon. I was able to set things up with nix-portable, and things build correctly on my cluster.

However, when I try to actually import operon, I see the following issue:

[worker5026 shm]$ python -c 'from operon.sklearn import SymbolicRegressor'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/nix/store/37yavbixv6larr8yiaf8x7j5lksg74nz-pyoperon/operon/__init__.py", line 4, in <module>
    from .pyoperon import *
ImportError: /mnt/sw/nix/store/kcrf6n4dmr5blhw2hzfy1j588bri8dzw-gcc-10.3.0/lib64/libstdc++.so.6: version `GLIBCXX_3.4.29' not found (required by /nix/store/37yavbixv6larr8yiaf8x7j5lksg74nz-pyoperon/operon/pyoperon.cpython-39-x86_64-linux-gnu.so)

I took a look at https://gcc.gnu.org/onlinedocs/libstdc++/manual/abi.html and found that I need gcc 11.1+ loaded for this GLIB version, rather than gcc-10.3.0. So, I did module unload gcc && module load gcc/11.2.0 to get the correct paths set up. Then I tried again, but now it gives me the following error:

[worker5026 shm]$ python -c 'from operon.sklearn import SymbolicRegressor'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/nix/store/37yavbixv6larr8yiaf8x7j5lksg74nz-pyoperon/operon/__init__.py", line 4, in <module>
    from .pyoperon import *
ModuleNotFoundError: No module named 'operon.pyoperon'

Any idea how to fix this?
Thanks!
Miles

explicit clang compiler specification required under conda on Linux

greetings from Prague;

we tried to install pyoperon at our HPC cluster using the README.md instructions at Rocky Linux 8.9, and without explicitly specifying:

export CXX=${CONDA_PREFIX}/bin/clang++
export CC=${CONDA_PREFIX}/bin/clang

the build fails.

Investigation revealed, that conda uses gcc compiler suite by default, and results of gcc on this code aren't satisfying. Namely multiple instances of error similar to:

/home/jose/projects/pyoperon_v4/pyoperon/operon/include/operon/interpreter/dispatch_table.hpp:111:2: error: extra ';' [-Werror=pedantic]
  111 | };

and

/home/jose/projects/pyoperon_v4/pyoperon/operon/include/operon/interpreter/dispatch_table.hpp:165:71: internal compiler error: Segmentation fault
  165 |     using Typ = std::conditional_t<detail::ExtentsLike<Lst>, decltype([]<auto... Idx>(std::index_sequence<Idx...>){
      |                                                                       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  166 |                     return std::make_tuple(std::tuple_element_t<Idx, Tup>{}...);
      |                     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  167 |                 }(std::make_index_sequence<sizeof...(Ts)-1>{})), Tup>;

I tested multiple version of GCC and after better understanding of codebase I understood (hopefully correctly) that the code is supposed to be built by clang.

specifying

export CXX=${CONDA_PREFIX}/bin/clang++
export CC=${CONDA_PREFIX}/bin/clang

makes the build work flawlessly, so perhaps it could be useful to add this hint into README.md, or possibly directly into ./script/dependencies.sh ?

cheers

josef

TypeError: cannot pickle 'pyoperon.pyoperon.Variable' object

I'm tryring to save a pyoperon model using pickle:

from pyoperon.sklearn import SymbolicRegressor
reg = SymbolicRegressor()
reg.fit(X_train, y_train)

filename = 'operon_model.sav'
pickle.Dump(reg, open(filename, 'wb'))

but I get the following error:

TypeError: cannot pickle 'pyoperon.pyoperon.Variable' object

Can help me with this issue?
Thanks

Can't build on Ubuntu

I'm trying to build the pyoperon package by following the instructions in BUILDING.md. I'm using vcpkg to install dependencies and managed to install all required dependencies, including the C++ operon version.

When I try to build pyoperon it gives me the following error:

guilherme@mini-ITX:~/Desktop/pyoperon$ cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DCMAKE_TOOLCHAIN_FILE=/opt/vcpkg/scripts/buildsystems/vcpkg.cmake
-- Running vcpkg install
-- Running vcpkg install - failed
CMake Error at /opt/vcpkg/scripts/buildsystems/vcpkg.cmake:831 (message):
  vcpkg install failed.  See logs for more information:
  /home/guilherme/Desktop/pyoperon/build/vcpkg-manifest-install.log
Call Stack (most recent call first):
  /usr/share/cmake-3.16/Modules/CMakeDetermineSystem.cmake:93 (include)
  CMakeLists.txt:5 (project)


CMake Error: CMake was unable to find a build program corresponding to "Unix Makefiles".  CMAKE_MAKE_PROGRAM is not set.  You probably need to select a different build tool.
CMake Error: CMAKE_CXX_COMPILER not set, after EnableLanguage
-- Configuring incomplete, errors occurred!

The generated log file /home/guilherme/Desktop/pyoperon/build/vcpkg-manifest-install.log:

Fetching registry information from https://github.com/foolnotion/vcpkg-registry>
Fetching registry information from https://github.com/microsoft/vcpkg (HEAD)...
Error: Cycle detected during vstat:x64-linux:

I also tried to build C++ operon by giving the -DBUILD_PYBIND=ON option. It gives me the following output:

guilherme@mini-ITX:~/Desktop/operon$ cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DCMAKE_TOOLCHAIN_FILE=/opt/vcpkg/scripts/buildsystems/vcpkg.cmake -DBUILD_PYBIND=ON
-- Running vcpkg install
-- Running vcpkg install - done
-- The CXX compiler identification is GNU 9.3.0
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Git: /usr/bin/git (found version "2.25.1") 
-- Looking for C++ include pthread.h
-- Looking for C++ include pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE  
-- Taskflow found. Headers: /home/guilherme/Desktop/operon/build/vcpkg_installed/x64-linux/include
-- Could NOT find aria-csv (missing: aria-csv_DIR)
-- Disabled features:
 * USE_OPENLIBM, Link against Julia's openlibm, a high performance mathematical library [default=OFF].
 * USE_JEMALLOC, Link against jemalloc, a general purpose malloc(3) implementation that emphasizes fragmentation avoidance and scalable concurrency support [default=OFF].
 * USE_TCMALLOC, Link against tcmalloc (thread-caching malloc), a malloc(3) implementation that reduces lock contention for multi-threaded programs [default=OFF].
 * USE_MIMALLOC, Link against mimalloc, a general purpose allocator with excellent performance characteristics [default=OFF].
 * USE_SINGLE_PRECISION, Perform model evaluation using floats (single precision) instead of doubles. Great for reducing runtime, might not be appropriate for all purposes [default=OFF].
 * USE_CERES_NNLS, Use the non-linear least squares optimizer from Ceres solver to tune model coefficients (if OFF, Eigen::LevenbergMarquardt will be used instead).

-- Performing Test COMPILER_HAS_HIDDEN_VISIBILITY
-- Performing Test COMPILER_HAS_HIDDEN_VISIBILITY - Success
-- Performing Test COMPILER_HAS_HIDDEN_INLINE_VISIBILITY
-- Performing Test COMPILER_HAS_HIDDEN_INLINE_VISIBILITY - Success
-- Performing Test COMPILER_HAS_DEPRECATED_ATTR
-- Performing Test COMPILER_HAS_DEPRECATED_ATTR - Success
-- Configuring done
-- Generating done
CMake Warning:
  Manually-specified variables were not used by the project:

    BUILD_PYBIND


-- Build files have been written to: /home/guilherme/Desktop/operon/build

I managed to install pyoperon via nix, but the package becomes available inside a new python installation that does not integrate with my other python packages, as well as it does not provide pip to install new ones.

Is there a way to make any of the building processes work? Or is there a way to use the nix version of pyoperon with my default anaconda3 python installation? My default python version is 3.9.7, which is pretty similar to the one nix installs.

ValueError: Input contains NaN.

I got an error when running Operon multiple times.

X, y = fetch_openml(data_id=1089, return_X_y=True)
X = StandardScaler().fit_transform(X)
X, y = np.array(X), np.array(y)
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
simple_operon = []
for _ in range(20):
    e = OperonX(generations=100, population_size=100)
    e.fit(x_train, y_train)
    print(r2_score(y_train, e.predict(x_train)))
    print(r2_score(y_test, e.predict(x_test)))
    simple_operon.append(r2_score(y_test, e.predict(x_test)))

The error information is as follows:

Traceback (most recent call last):
  File "/tmp/pycharm_project_44/example/performance_evaluation.py", line 22, in <module>
    e.fit(x_train, y_train)
  File "/tmp/pycharm_project_44/sr_forest/operon_forest.py", line 461, in fit
    self.individuals_ = [get_solution_stats(x)[0] for x in gp.Individuals[:self.population_size]]
  File "/tmp/pycharm_project_44/sr_forest/operon_forest.py", line 461, in <listcomp>
    self.individuals_ = [get_solution_stats(x)[0] for x in gp.Individuals[:self.population_size]]
  File "/tmp/pycharm_project_44/sr_forest/operon_forest.py", line 438, in get_solution_stats
    mse = mean_squared_error(y, y_pred * scale + offset)
  File "/vol/ecrg-solar/zhangheng1/anaconda3/envs/gpgomenv/lib/python3.8/site-packages/sklearn/metrics/_regression.py", line 442, in mean_squared_error
    y_type, y_true, y_pred, multioutput = _check_reg_targets(
  File "/vol/ecrg-solar/zhangheng1/anaconda3/envs/gpgomenv/lib/python3.8/site-packages/sklearn/metrics/_regression.py", line 102, in _check_reg_targets
    y_pred = check_array(y_pred, ensure_2d=False, dtype=dtype)
  File "/vol/ecrg-solar/zhangheng1/anaconda3/envs/gpgomenv/lib/python3.8/site-packages/sklearn/utils/validation.py", line 899, in check_array
    _assert_all_finite(
  File "/vol/ecrg-solar/zhangheng1/anaconda3/envs/gpgomenv/lib/python3.8/site-packages/sklearn/utils/validation.py", line 146, in _assert_all_finite
    raise ValueError(msg_err)
ValueError: Input contains NaN.

Process finished with exit code 1

I guess the reason might be some SR models predict nan values, and thus lead to scikit-learn raising such an error. However, I don't know how to fix this problem. Can you help me to deal with this problem? Thanks.

Here is a reproducible example.

`double free or corruption (out)`

Hi there,

I get double free or corruption (out) every time I run pyoperon installed using wheels from the actions page. I don't get this error using the released packaged here on github, but that one is considerably outdated.

I've tried on two different linux machines (CentOS7 and arch-based Endeavour OS), with different python versions (3.9 and 3.11).

Any way I can help debug further?

Best,
M

PS: I can't access any artefact on the actions page older than November as they have expired.

Unable to install pyoperon by the README's nix develop command

I am on a Macbook (M1, 2020).

When I run "nix develop github:heal-research/pyoperon --no-write-lock-file" on my "/Users/[MY_USER_NAME]" directory:

I get the error message
warning: not writing modified lock file of flake 'github:heal-research/pyoperon':
• Added input 'flake-utils':
'github:numtide/flake-utils/a1720a10a6cfe8234c0e93907ffe81be440f4cef' (2023-05-31)
• Added input 'flake-utils/systems':
'github:nix-systems/default/da67096a3b9bf56a91d16901293e51ba5b49a27e' (2023-04-09)
• Added input 'foolnotion':
'github:foolnotion/nur-pkg/00f1e56faf00f6dca253ee3ed3e3327809a48852' (2023-05-29)
• Added input 'foolnotion/nixpkgs':
'github:NixOS/nixpkgs/729058d86a76758ba3bc08c20658647a427a772f' (2023-06-09)
• Added input 'nixpkgs':
'github:nixos/nixpkgs/729058d86a76758ba3bc08c20658647a427a772f' (2023-06-09)
• Added input 'pratt-parser':
'github:foolnotion/pratt-parser-calculator/025ba103339bb69e3b719b62f3457d5cbb9644e6' (2022-11-15)
• Added input 'pratt-parser/flake-utils':
'github:numtide/flake-utils/a1720a10a6cfe8234c0e93907ffe81be440f4cef' (2023-05-31)
• Added input 'pratt-parser/flake-utils/systems':
'github:nix-systems/default/da67096a3b9bf56a91d16901293e51ba5b49a27e' (2023-04-09)
• Added input 'pratt-parser/foolnotion':
'github:foolnotion/nur-pkg/00f1e56faf00f6dca253ee3ed3e3327809a48852' (2023-05-29)
• Added input 'pratt-parser/foolnotion/nixpkgs':
'github:NixOS/nixpkgs/729058d86a76758ba3bc08c20658647a427a772f' (2023-06-09)
• Added input 'pratt-parser/nixpkgs':
'github:nixos/nixpkgs/729058d86a76758ba3bc08c20658647a427a772f' (2023-06-09)
error:
… while calling the 'derivationStrict' builtin

     at /builtin/derivation.nix:9:12: (source not available)

   … while evaluating derivation 'nix-shell'
     whose name attribute is located at /nix/store/n39wd2j5hmg3cbn9kyqz8dw2kfbnvgfz-source/pkgs/stdenv/generic/make-derivation.nix:303:7

   … while evaluating attribute '__impureHostDeps' of derivation 'nix-shell'

     at /nix/store/n39wd2j5hmg3cbn9kyqz8dw2kfbnvgfz-source/pkgs/stdenv/generic/make-derivation.nix:462:7:

      461|       __propagatedSandboxProfile = lib.unique (computedPropagatedSandboxProfile ++ [ propagatedSandboxProfile ]);
      462|       __impureHostDeps = computedImpureHostDeps ++ computedPropagatedImpureHostDeps ++ __propagatedImpureHostDeps ++ __impureHostDeps ++ stdenv.__extraImpureHostDeps ++ [
         |       ^
      463|         "/dev/zero"

   error: evaluation aborted with the following error message: 'Function called without required argument "xxhash" at /nix/store/m7yb7kg5bny9xvn4chsc68nwjf0jjxk6-source/nix/operon/default.nix:25, did you mean "xxHash", "ethash" or "phash"?'

Then after, when I do "pip show pyoperon", I get the following message: "WARNING: Package(s) not found: pyoperon". So it is pretty evident that I did not install pyoperon.
Anyone have advice?

How to get the prediction of all models in the final population?

Hi! I want to implement an ensemble model based on the final population or an external archive, like [1]. However, I cannot find a way to get the prediction of all models in the final population. Can you help me with this? Thanks a lot!

[1]. Zhang, Hengzhe, Aimin Zhou, and Hu Zhang. "An Evolutionary Forest for Regression." IEEE Transactions on Evolutionary Computation (2021).

Problem with GLIBCXX_3.4.30 while importing package to python

Hello, thank you for this amazing package.
I've just installed PyOperon package with Nix Package Manager.
But whenever i run the test file, i got these Error.
ImportError: /nix/store/k2a429wpxgfwp4jaacl9iaqw4kxqjaxa-gcc-11.3.0-lib/lib/libstdc++.so.6: version 'GLIBCXX_3.4.30' not found (required by /nix/store/qry0r9qpw00n1knhhiq5gn0xlgzygyav-operon/lib/liboperon.so.0)
One more problem about nix since i'm not familiar with it. After i close the terminal which i installed PyOperon and reopen another terminal, it seem like python cant find the package
ModuleNotFoundError: No module named 'operon'

Best Regards

switch project to nanobind

nanobind is a pybind11 successor library which leverages modern C++ to improve compilation speed, produce smaller binaries and get better runtime performance. this would be a great fit for pyoperon and would lessen further development effort of the python bindings.

https://nanobind.readthedocs.io/en/latest/why.html