Comments (8)
Hm, I wonder if the select_k_features
is not applying the selection mask to the X_units
and y_units
... Does it work without that set?
from pysr.
Also, I would consider lowering the dimensional_constraint_penalty
. Sometimes when it is too harsh of a penalty, it prevents the search from exploring efficiently.
from pysr.
Hm, it looks like I indeed remembered to apply the selection mask to the units as well:
Lines 1541 to 1543 in 57dd7d2
from pysr.
Actually maybe I misunderstood the problem. It could already be working but maybe the output is unclear. Could you describe:
The function chosen, with a loss of 0.8, is not dimensionally consistent with itself, and is not even close to J/V.
what you mean here with an example? Note that any constants found during the search actually have their own units. The string [⋅]
basically means it can take on any units that make the equation work. (I'd consider adding an option to remove this "wildcard dimensions" functionality if you want)
from pysr.
Also, I would consider lowering the
dimensional_constraint_penalty
. Sometimes when it is too harsh of a penalty, it prevents the search from exploring efficiently.
Thanks for the tip, I will try it, I had put it to such high value because it is the one suggested in the example of dimensional, but I guess in that case it was so high because the values were also very high.
Actually maybe I misunderstood the problem. It could already be working but maybe the output is unclear. Could you describe:
The function chosen, with a loss of 0.8, is not dimensionally consistent with itself, and is not even close to J/V.
what you mean here with an example? Note that any constants found during the search actually have their own units. The string
[⋅]
basically means it can take on any units that make the equation work. (I'd consider adding an option to remove this "wildcard dimensions" functionality if you want)
I was thinking about exactly that, that maybe this was not a bug, but just a misunderstanding from my part about the functionality. From what I gathered, there is no way to get the individual units of each constant term, which could be important.
About your suggestion, the removal of the wildcard, or at least a way to distinguish between unitless constants and constants with units (by having them have a chosen complexity for instance), would be very applicable. Maybe I'm misunderstanding the algorithm backstage, but if you can have any constant be any unit then it seems that it will most likely bypass the unit search (maybe in my case it was more pressing, because some of my variables were indeed constants).
This means that clearly this was not a bug, but just a misunderstanding of my part.
from pysr.
Right, as an example, the expression:
"y[m s⁻² kg] = (M[kg] * 2.6353e-22[⋅])"
is actually dimensionally consistent, because the ⋅
, when solved, can take on the units of m s⁻²
.
However, the expression:
"y[m s⁻² kg] = (M[kg] * 2.6353e-22[⋅] + m[kg])"
would not be dimensionally consistent, because there does not exist any such units inserted into the ⋅
that could make this expression work.
So, you may be asking: why not show units in the ⋅
instead of just leaving it blank (and having the user figure it out afterwards)? The reason is basically that I found it is much faster to check dimensional consistency this way. If we were to solve exactly what units should be used in each ⋅
, it would be a bit slower (not to mention sometimes there are multiple solutions). The reason is: dimensional check, you basically just have to trace from leaves of the expression upwards, recording if there is a "wildcard" dimension or not. But for getting the specific units, you would have to first trace from leaves to root, and then from root to leaves to fill in the units.
Since we need to very rapidly evaluate dimensional consistency, the tradeoff just did not seem worth it, compared to the user figuring out the units afterwards.
But maybe there is a fast way to do it, and we could display the units instead of ⋅
. The dimensional analysis portion of code was fairly recent and I'm open to suggestions/changing it!
You can see the dimensional analysis code here:
For example, the code for addition and subtraction operators is given here (op
is either +
or -
)
@eval function $(op)(l::W, r::W) where {Q,W<:WildcardQuantity{Q}}
l.violates && return l
r.violates && return r
if same_dimensions(l, r)
return W($(op)(l.val, r.val), l.wildcard && r.wildcard, false)
elseif l.wildcard && r.wildcard
return W(
constructor_of(Q)($(op)(ustrip(l), ustrip(r)), typeof(dimension(l))),
true,
false,
)
elseif l.wildcard
return W($(op)(constructor_of(Q)(ustrip(l), dimension(r)), r.val), false, false)
elseif r.wildcard
return W($(op)(l.val, constructor_of(Q)(ustrip(r), dimension(l))), false, false)
else
return W(one(Q), false, true)
end
end
You can see there are five branches (after first checking if either the left or right argument is already dimensionally invalid):
- Both left and right have the same dimensionality => the expression is valid AND any wildcard units are propagated upwards to the parent expression (so they can be consumed later).
- Otherwise, we check if both left and right have a wildcard unit => the expression is valid AND the wildcard unit is propagated.
- Otherwise, we check if the left has a wildcard => expression is valid, no wildcard unit propagated. This consumes the wildcard unit (basically a point that sets the unit of the constant. Maybe we could fill in the unit with a pointer here...).
- Similar, but for right argument.
- Invalid expression, as not dimensionally consistent and there is no wildcard.
from pysr.
The type used for "wildcard" quantities is this one:
"""
WildcardQuantity{Q<:AbstractQuantity}
A wrapper for a `AbstractQuantity` that allows for a wildcard feature, indicating
there is a free constant whose dimensions are not yet determined.
Also stores a flag indicating whether an expression is dimensionally consistent.
"""
struct WildcardQuantity{Q<:AbstractQuantity}
val::Q
wildcard::Bool
violates::Bool
end
Q
is a quantity-like type. The quantity objects are from the units package DynamicQuantities.jl, but this WildcardQuantity
is defined in SymbolicRegression.jl
from pysr.
Also, as a quick and dirty way to avoid learned constants, you can use complexity_of_constants=100
. Then constants are prohibitively expensive so the search will avoid them.
from pysr.
Related Issues (20)
- [BUG]: Installation failed on Windows: failed to clone from SymbolicRegression.jl HOT 2
- [BUG]: Julia interface fails on conda environments HOT 12
- [BUG]: PySR currently incompatible with Python 3.12 HOT 4
- [Feature]: Adding linting to pre-commit hooks HOT 1
- [Feature]: How can I use GPU to accelerate training, like deep learning? HOT 5
- [BUG]: runtime grows to infinity HOT 5
- [Feature]: Self-contained install/user interface HOT 1
- [Feature]: Big Data and partial differential operator
- Understanding accuracy of expressions HOT 5
- Rethinking default behavior for number of iterations
- [Feature]: fixsize HOT 2
- [BUG]: Small typo in documentation HOT 1
- [BUG]: Possible memory leakage & best practices for memory scaling? HOT 5
- [Feature]: Add priori of equation form HOT 1
- [BUG]: PySR can not rediscover X^2 HOT 3
- [Feature]: Composite regressors
- [Feature]: Warn if better linear model available HOT 4
- [BUG]: Using dimensional constraints result in "UndefVarError: `k` not defined" error HOT 4
- Update Head Worker Occupation Warning HOT 3
- Command '['julia', '-e', '...']' returned non-zero exit status 1 HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pysr.