heal-research / pyoperon Goto Github PK
View Code? Open in Web Editor NEWPython bindings and scikit-learn interface for the Operon library for symbolic regression.
License: MIT License
Python bindings and scikit-learn interface for the Operon library for symbolic regression.
License: MIT License
Hello,
After successful installation, running examples that use sklearn results in a segmentation fault. After some debugging I narrowed the error to running the MinimumDescriptionLengthEvaluator on the results to obtain the best model. If I comment out that part in sklearn.py and just obtain the pareto front, or use BIC or AIC, everything works fine. Other examples also work perfectly.
This error has been reproduced on several different machines. Have you ever encoutered it?
Thank you very much for any comments,
Best,
David
Right now, optimizing a tree with pyoperon
requires a lot of ugly code:
def evaluate_with_pyoperon(pdata, tree, range_train, range_test):
a, b = range_train
c, d = range_test
# pyoperon
pyop_dataset = op.Dataset(pdata.values)
pyop_dataset.VariableNames = pdata.columns
pyop_range_tr = op.Range(a, b)
pyop_range_te = op.Range(c, d)
pyop_vars = sorted(pyop_dataset.Variables, key=lambda v: v.Index)
pyop_hashes = [v.Hash for v in pyop_vars[:-1]]
pyop_target = pyop_vars[-1]
pyop_problem = op.Problem(pyop_dataset, pyop_range_tr, pyop_range_te)
pyop_problem.InputHashes = pyop_hashes
pyop_problem.Target = pyop_target
pyop_dt = op.DispatchTable()
pyop_opt = op.LMOptimizer(pyop_dt, pyop_problem, max_iter=20)
rng = op.RomuTrio(np.random.randint(1, 1_000_000))
summary = pyop_opt.Optimize(rng, tree)
if summary.Success:
pyop_tree.SetCoefficients(summary.FinalParameters)
range_full = op.Range(0, pyop_dataset.Rows)
return op.Evaluate(pyop_dt, pyop_tree, pyop_dataset, range_full)
This should not be so complicated. At the very least, if pdata
is a dataframe, we should hide the construction of the dataset and problem and offer a simplified API.
Hi,
I've been able to isolate a segmentation fault to the use of Operon.MultiEvaluator, specifically when you add more than one evaluator to the MultiEvaluator object.
Simple reproducible example (replacing the Operon.Evaluator() definition in example/operon-bindings.py ):
evaluator = Operon.MultiEvaluator(problem)
for i in range(2): # works fine if changed to range(1)
evaluator_i = Operon.Evaluator(problem, dtable, error_metric, True) # initialize evaluator, use linear scaling = True
evaluator_i.Budget = 1000 * 1000 # computational budget
optimizer = Operon.LMOptimizer(dtable, problem, max_iter=3)
evaluator_i.Optimizer = optimizer
evaluator.Add(evaluator_i)
aggregateEvaluator = Operon.AggregateEvaluator(evaluator)
aggregateEvaluator.AggregateType = Operon.AggregateType.Max
# define how new offspring are created
generator = Operon.BasicOffspringGenerator(aggregateEvaluator, crossover, mutation, selector, selector)
Not sure if this bug appears on Linux machines as well. Segfault still occurs when not applying aggregateEvaluator and instead feeding evaluator directly into the last line, so the issue does not seem to be with AggregateEvaluator. Using Python 3.11, and working on a MacBook M1 Pro with Sonoma 14.5 Beta. Installed using git clone + pip instructions.
Hi @foolnotion,
I am trying to run the srbench suite for a paper I am writing on my code PySR, but I am having difficulty setting up operon. I tried the conda build script in srbench for operon at first but couldn't get it to work due to various issues when building eve - it seems like it is not able to interpret the float_
and other custom category::{type}
types meant, without them explicitly being labeled as category::float_
, etc. Not sure why this error occurs.
I decided it would be too difficult to manually fix those issues, so yesterday I decided instead to try out nix using nix-portable since this looks like the recommended approach for building operon. I was able to set things up with nix-portable, and things build correctly on my cluster.
However, when I try to actually import operon, I see the following issue:
[worker5026 shm]$ python -c 'from operon.sklearn import SymbolicRegressor'
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/nix/store/37yavbixv6larr8yiaf8x7j5lksg74nz-pyoperon/operon/__init__.py", line 4, in <module>
from .pyoperon import *
ImportError: /mnt/sw/nix/store/kcrf6n4dmr5blhw2hzfy1j588bri8dzw-gcc-10.3.0/lib64/libstdc++.so.6: version `GLIBCXX_3.4.29' not found (required by /nix/store/37yavbixv6larr8yiaf8x7j5lksg74nz-pyoperon/operon/pyoperon.cpython-39-x86_64-linux-gnu.so)
I took a look at https://gcc.gnu.org/onlinedocs/libstdc++/manual/abi.html and found that I need gcc 11.1+
loaded for this GLIB version, rather than gcc-10.3.0
. So, I did module unload gcc && module load gcc/11.2.0
to get the correct paths set up. Then I tried again, but now it gives me the following error:
[worker5026 shm]$ python -c 'from operon.sklearn import SymbolicRegressor'
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/nix/store/37yavbixv6larr8yiaf8x7j5lksg74nz-pyoperon/operon/__init__.py", line 4, in <module>
from .pyoperon import *
ModuleNotFoundError: No module named 'operon.pyoperon'
Any idea how to fix this?
Thanks!
Miles
greetings from Prague;
we tried to install pyoperon
at our HPC cluster using the README.md
instructions at Rocky Linux 8.9, and without explicitly specifying:
export CXX=${CONDA_PREFIX}/bin/clang++
export CC=${CONDA_PREFIX}/bin/clang
the build fails.
Investigation revealed, that conda uses gcc compiler suite by default, and results of gcc on this code aren't satisfying. Namely multiple instances of error similar to:
/home/jose/projects/pyoperon_v4/pyoperon/operon/include/operon/interpreter/dispatch_table.hpp:111:2: error: extra ';' [-Werror=pedantic]
111 | };
and
/home/jose/projects/pyoperon_v4/pyoperon/operon/include/operon/interpreter/dispatch_table.hpp:165:71: internal compiler error: Segmentation fault
165 | using Typ = std::conditional_t<detail::ExtentsLike<Lst>, decltype([]<auto... Idx>(std::index_sequence<Idx...>){
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
166 | return std::make_tuple(std::tuple_element_t<Idx, Tup>{}...);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
167 | }(std::make_index_sequence<sizeof...(Ts)-1>{})), Tup>;
I tested multiple version of GCC and after better understanding of codebase I understood (hopefully correctly) that the code is supposed to be built by clang.
specifying
export CXX=${CONDA_PREFIX}/bin/clang++
export CC=${CONDA_PREFIX}/bin/clang
makes the build work flawlessly, so perhaps it could be useful to add this hint into README.md, or possibly directly into ./script/dependencies.sh
?
cheers
josef
I'm tryring to save a pyoperon model using pickle:
from pyoperon.sklearn import SymbolicRegressor
reg = SymbolicRegressor()
reg.fit(X_train, y_train)
filename = 'operon_model.sav'
pickle.Dump(reg, open(filename, 'wb'))
but I get the following error:
TypeError: cannot pickle 'pyoperon.pyoperon.Variable' object
Can help me with this issue?
Thanks
I'm trying to build the pyoperon package by following the instructions in BUILDING.md
. I'm using vcpkg to install dependencies and managed to install all required dependencies, including the C++ operon version.
When I try to build pyoperon it gives me the following error:
guilherme@mini-ITX:~/Desktop/pyoperon$ cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DCMAKE_TOOLCHAIN_FILE=/opt/vcpkg/scripts/buildsystems/vcpkg.cmake
-- Running vcpkg install
-- Running vcpkg install - failed
CMake Error at /opt/vcpkg/scripts/buildsystems/vcpkg.cmake:831 (message):
vcpkg install failed. See logs for more information:
/home/guilherme/Desktop/pyoperon/build/vcpkg-manifest-install.log
Call Stack (most recent call first):
/usr/share/cmake-3.16/Modules/CMakeDetermineSystem.cmake:93 (include)
CMakeLists.txt:5 (project)
CMake Error: CMake was unable to find a build program corresponding to "Unix Makefiles". CMAKE_MAKE_PROGRAM is not set. You probably need to select a different build tool.
CMake Error: CMAKE_CXX_COMPILER not set, after EnableLanguage
-- Configuring incomplete, errors occurred!
The generated log file /home/guilherme/Desktop/pyoperon/build/vcpkg-manifest-install.log
:
Fetching registry information from https://github.com/foolnotion/vcpkg-registry>
Fetching registry information from https://github.com/microsoft/vcpkg (HEAD)...
Error: Cycle detected during vstat:x64-linux:
I also tried to build C++ operon by giving the -DBUILD_PYBIND=ON
option. It gives me the following output:
guilherme@mini-ITX:~/Desktop/operon$ cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DCMAKE_TOOLCHAIN_FILE=/opt/vcpkg/scripts/buildsystems/vcpkg.cmake -DBUILD_PYBIND=ON
-- Running vcpkg install
-- Running vcpkg install - done
-- The CXX compiler identification is GNU 9.3.0
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Git: /usr/bin/git (found version "2.25.1")
-- Looking for C++ include pthread.h
-- Looking for C++ include pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE
-- Taskflow found. Headers: /home/guilherme/Desktop/operon/build/vcpkg_installed/x64-linux/include
-- Could NOT find aria-csv (missing: aria-csv_DIR)
-- Disabled features:
* USE_OPENLIBM, Link against Julia's openlibm, a high performance mathematical library [default=OFF].
* USE_JEMALLOC, Link against jemalloc, a general purpose malloc(3) implementation that emphasizes fragmentation avoidance and scalable concurrency support [default=OFF].
* USE_TCMALLOC, Link against tcmalloc (thread-caching malloc), a malloc(3) implementation that reduces lock contention for multi-threaded programs [default=OFF].
* USE_MIMALLOC, Link against mimalloc, a general purpose allocator with excellent performance characteristics [default=OFF].
* USE_SINGLE_PRECISION, Perform model evaluation using floats (single precision) instead of doubles. Great for reducing runtime, might not be appropriate for all purposes [default=OFF].
* USE_CERES_NNLS, Use the non-linear least squares optimizer from Ceres solver to tune model coefficients (if OFF, Eigen::LevenbergMarquardt will be used instead).
-- Performing Test COMPILER_HAS_HIDDEN_VISIBILITY
-- Performing Test COMPILER_HAS_HIDDEN_VISIBILITY - Success
-- Performing Test COMPILER_HAS_HIDDEN_INLINE_VISIBILITY
-- Performing Test COMPILER_HAS_HIDDEN_INLINE_VISIBILITY - Success
-- Performing Test COMPILER_HAS_DEPRECATED_ATTR
-- Performing Test COMPILER_HAS_DEPRECATED_ATTR - Success
-- Configuring done
-- Generating done
CMake Warning:
Manually-specified variables were not used by the project:
BUILD_PYBIND
-- Build files have been written to: /home/guilherme/Desktop/operon/build
I managed to install pyoperon via nix, but the package becomes available inside a new python installation that does not integrate with my other python packages, as well as it does not provide pip to install new ones.
Is there a way to make any of the building processes work? Or is there a way to use the nix version of pyoperon with my default anaconda3 python installation? My default python version is 3.9.7, which is pretty similar to the one nix installs.
I got an error when running Operon multiple times.
X, y = fetch_openml(data_id=1089, return_X_y=True)
X = StandardScaler().fit_transform(X)
X, y = np.array(X), np.array(y)
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
simple_operon = []
for _ in range(20):
e = OperonX(generations=100, population_size=100)
e.fit(x_train, y_train)
print(r2_score(y_train, e.predict(x_train)))
print(r2_score(y_test, e.predict(x_test)))
simple_operon.append(r2_score(y_test, e.predict(x_test)))
The error information is as follows:
Traceback (most recent call last):
File "/tmp/pycharm_project_44/example/performance_evaluation.py", line 22, in <module>
e.fit(x_train, y_train)
File "/tmp/pycharm_project_44/sr_forest/operon_forest.py", line 461, in fit
self.individuals_ = [get_solution_stats(x)[0] for x in gp.Individuals[:self.population_size]]
File "/tmp/pycharm_project_44/sr_forest/operon_forest.py", line 461, in <listcomp>
self.individuals_ = [get_solution_stats(x)[0] for x in gp.Individuals[:self.population_size]]
File "/tmp/pycharm_project_44/sr_forest/operon_forest.py", line 438, in get_solution_stats
mse = mean_squared_error(y, y_pred * scale + offset)
File "/vol/ecrg-solar/zhangheng1/anaconda3/envs/gpgomenv/lib/python3.8/site-packages/sklearn/metrics/_regression.py", line 442, in mean_squared_error
y_type, y_true, y_pred, multioutput = _check_reg_targets(
File "/vol/ecrg-solar/zhangheng1/anaconda3/envs/gpgomenv/lib/python3.8/site-packages/sklearn/metrics/_regression.py", line 102, in _check_reg_targets
y_pred = check_array(y_pred, ensure_2d=False, dtype=dtype)
File "/vol/ecrg-solar/zhangheng1/anaconda3/envs/gpgomenv/lib/python3.8/site-packages/sklearn/utils/validation.py", line 899, in check_array
_assert_all_finite(
File "/vol/ecrg-solar/zhangheng1/anaconda3/envs/gpgomenv/lib/python3.8/site-packages/sklearn/utils/validation.py", line 146, in _assert_all_finite
raise ValueError(msg_err)
ValueError: Input contains NaN.
Process finished with exit code 1
I guess the reason might be some SR models predict nan values, and thus lead to scikit-learn raising such an error. However, I don't know how to fix this problem. Can you help me to deal with this problem? Thanks.
Hi there,
I get double free or corruption (out)
every time I run pyoperon installed using wheels from the actions
page. I don't get this error using the released packaged here on github, but that one is considerably outdated.
I've tried on two different linux machines (CentOS7 and arch-based Endeavour OS), with different python versions (3.9
and 3.11
).
Any way I can help debug further?
Best,
M
PS: I can't access any artefact on the actions page older than November as they have expired.
I am on a Macbook (M1, 2020).
When I run "nix develop github:heal-research/pyoperon --no-write-lock-file" on my "/Users/[MY_USER_NAME]" directory:
I get the error message
warning: not writing modified lock file of flake 'github:heal-research/pyoperon':
• Added input 'flake-utils':
'github:numtide/flake-utils/a1720a10a6cfe8234c0e93907ffe81be440f4cef' (2023-05-31)
• Added input 'flake-utils/systems':
'github:nix-systems/default/da67096a3b9bf56a91d16901293e51ba5b49a27e' (2023-04-09)
• Added input 'foolnotion':
'github:foolnotion/nur-pkg/00f1e56faf00f6dca253ee3ed3e3327809a48852' (2023-05-29)
• Added input 'foolnotion/nixpkgs':
'github:NixOS/nixpkgs/729058d86a76758ba3bc08c20658647a427a772f' (2023-06-09)
• Added input 'nixpkgs':
'github:nixos/nixpkgs/729058d86a76758ba3bc08c20658647a427a772f' (2023-06-09)
• Added input 'pratt-parser':
'github:foolnotion/pratt-parser-calculator/025ba103339bb69e3b719b62f3457d5cbb9644e6' (2022-11-15)
• Added input 'pratt-parser/flake-utils':
'github:numtide/flake-utils/a1720a10a6cfe8234c0e93907ffe81be440f4cef' (2023-05-31)
• Added input 'pratt-parser/flake-utils/systems':
'github:nix-systems/default/da67096a3b9bf56a91d16901293e51ba5b49a27e' (2023-04-09)
• Added input 'pratt-parser/foolnotion':
'github:foolnotion/nur-pkg/00f1e56faf00f6dca253ee3ed3e3327809a48852' (2023-05-29)
• Added input 'pratt-parser/foolnotion/nixpkgs':
'github:NixOS/nixpkgs/729058d86a76758ba3bc08c20658647a427a772f' (2023-06-09)
• Added input 'pratt-parser/nixpkgs':
'github:nixos/nixpkgs/729058d86a76758ba3bc08c20658647a427a772f' (2023-06-09)
error:
… while calling the 'derivationStrict' builtin
at /builtin/derivation.nix:9:12: (source not available)
… while evaluating derivation 'nix-shell'
whose name attribute is located at /nix/store/n39wd2j5hmg3cbn9kyqz8dw2kfbnvgfz-source/pkgs/stdenv/generic/make-derivation.nix:303:7
… while evaluating attribute '__impureHostDeps' of derivation 'nix-shell'
at /nix/store/n39wd2j5hmg3cbn9kyqz8dw2kfbnvgfz-source/pkgs/stdenv/generic/make-derivation.nix:462:7:
461| __propagatedSandboxProfile = lib.unique (computedPropagatedSandboxProfile ++ [ propagatedSandboxProfile ]);
462| __impureHostDeps = computedImpureHostDeps ++ computedPropagatedImpureHostDeps ++ __propagatedImpureHostDeps ++ __impureHostDeps ++ stdenv.__extraImpureHostDeps ++ [
| ^
463| "/dev/zero"
error: evaluation aborted with the following error message: 'Function called without required argument "xxhash" at /nix/store/m7yb7kg5bny9xvn4chsc68nwjf0jjxk6-source/nix/operon/default.nix:25, did you mean "xxHash", "ethash" or "phash"?'
"
Then after, when I do "pip show pyoperon", I get the following message: "WARNING: Package(s) not found: pyoperon". So it is pretty evident that I did not install pyoperon.
Anyone have advice?
Hi! I want to implement an ensemble model based on the final population or an external archive, like [1]. However, I cannot find a way to get the prediction of all models in the final population. Can you help me with this? Thanks a lot!
[1]. Zhang, Hengzhe, Aimin Zhou, and Hu Zhang. "An Evolutionary Forest for Regression." IEEE Transactions on Evolutionary Computation (2021).
Hello, thank you for this amazing package.
I've just installed PyOperon package with Nix Package Manager.
But whenever i run the test file, i got these Error.
ImportError: /nix/store/k2a429wpxgfwp4jaacl9iaqw4kxqjaxa-gcc-11.3.0-lib/lib/libstdc++.so.6: version 'GLIBCXX_3.4.30' not found (required by /nix/store/qry0r9qpw00n1knhhiq5gn0xlgzygyav-operon/lib/liboperon.so.0)
One more problem about nix since i'm not familiar with it. After i close the terminal which i installed PyOperon and reopen another terminal, it seem like python cant find the package
ModuleNotFoundError: No module named 'operon'
Best Regards
nanobind is a pybind11 successor library which leverages modern C++ to improve compilation speed, produce smaller binaries and get better runtime performance. this would be a great fit for pyoperon and would lessen further development effort of the python bindings.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.