Comments (10)
While this is not the focus of this library, it is possible to use pwlf to find an optimum number of line segments.
The problem with automatically detecting the number of line segments is that the objective function is not as well defined. Many questions will need to be answered. Do you introduce a regularizer? Do you introduce a penalty? If you do not have a lot of noise, it's likely that without some form of regularization or penalty function that you may have as many line segments as you have data points. This no longer becomes simply minimizing a L2 norm.
Do you have any suggestions on how to find the optimum number of line segments?
Let me give you an example of how you could use pwlf to find the best number of line segments. This example just uses a simple penalty parameter to penalize the model complexity, or the number of line segments. Essentially we want to find x such that my_obj is minimized. I then go on to minimize this function using bayesian optimization (or EGO). To see how this is done, check out the newly added example.
# define your objective function
def my_obj(x):
# define some penalty parameter l
# you'll have to arbitrarily pick this
# it depends upon the noise in your data,
# and the value of your sum of square of residuals
l = y.mean()*0.001
f = np.zeros(x.shape[0])
for i, j in enumerate(x):
my_pwlf.fit(j[0])
f[i] = my_pwlf.ssr + (l*j[0])
return f
from piecewise_linear_fit_py.
Would it make sense in your application that breakpoint locations can only occur at a data point? That could greatly reduce the time of the inner optimization.
You could also try using the fitfast()
function instead of the fit()
function within my_obj
.
from piecewise_linear_fit_py.
I want to open this up, because depending upon your problem their may be easier ways to deal with finding the best number of line segments.
If you don't have noise in your data, it may be possible to detect the number of line segments from a simple least squares fit. I try to show that in the first example here (this requires the 0.4.0 version in development)
from piecewise_linear_fit_py.
While this is not the focus of this library, it is possible to use pwlf to find an optimum number of line segments.
The problem with automatically detecting the number of line segments is that the objective function is not as well defined. Many questions will need to be answered. Do you introduce a regularizer? Do you introduce a penalty? If you do not have a lot of noise, it's likely that without some form of regularization or penalty function that you may have as many line segments as you have data points. This no longer becomes simply minimizing a L2 norm.
Do you have any suggestions on how to find the optimum number of line segments?
Let me give you an example of how you could use pwlf to find the best number of line segments. This example just uses a simple penalty parameter to penalize the model complexity, or the number of line segments. Essentially we want to find x such that my_obj is minimized. I then go on to minimize this function using bayesian optimization (or EGO). To see how this is done, check out the newly added example.
# define your objective function def my_obj(x): # define some penalty parameter l # you'll have to arbitrarily pick this # it depends upon the noise in your data, # and the value of your sum of square of residuals l = y.mean()*0.001 f = np.zeros(x.shape[0]) for i, j in enumerate(x): my_pwlf.fit(j[0]) f[i] = my_pwlf.ssr + (l*j[0]) return f
hello there!
i'm currently working on a problem which requires using this package. however, determining the optimal number of line segments in advance is a must. so as i was searching the net i stumbled upon this piece of code in which the underlying idea i assume to be a tradeoff between "ssr" and the "number of line segments". however "j" is a scalar and can not be indexed so i got an error. then i was wondering if i could replace the "j[0]" with "i" or not?
on top of that for i = 1 i receive another error.
from piecewise_linear_fit_py.
The short answer is that x in the above case was expected to be a 2d array.
just do
def my_obj(x):
# x is integer, returns float
l = y.mean()*0.001
my_pwlf.fit(x)
f = my_pwlf.ssr + (l*x)
return f
For a longer answer. I've been working on a better way to do this.
How many data points do you have in your application? What is the upper limit of line segments you are searching for?
from piecewise_linear_fit_py.
The short answer is that x in the above case was expected to be a 2d array.
just do
def my_obj(x): # x is integer, returns float l = y.mean()*0.001 my_pwlf.fit(x) f = my_pwlf.ssr + (l*x) return fFor a longer answer. I've been working on a better way to do this.
How many data points do you have in your application? What is the upper limit of line segments you are searching for?
thanks for the fast response.
the short version solved the issue.
i have 25 data points sampled from "a/x" curve in which "a" is the constant and "x" is the variable. the less line segments i have the better it will be for my following intentions.
from piecewise_linear_fit_py.
@hyperactivetimes Another approach I have been playing around with, that would work well in your case is to compare the model's prediction variance, and compare this to an approximate variance from the data. Ideally this ratio will be 1.0.
This is a proof of concept, that manually tries 2 segments, then 3, then 4, until we've broken the ratio.
import numpy as np
import matplotlib.pyplot as plt
import pwlf
def calcGSJ(data):
"""
Estimate the variance in data u sing the GSJ method
"""
x = data[:, 0]
y = data[:, 1]
n = len(x)
eTilde = np.zeros(len(x)-2)
ci2 = np.zeros(len(x)-2)
# GSJ method
for i in range(1, n-1):
di = x[i+1]-x[i]
if di == 0:
ai = 0.5
else:
ai = (x[i+1] - x[i]) / di
bi = 1.0 - ai
ci2[i-1] = 1.0 / (ai**2 + bi**2 + 1.0)
eTilde[i-1] = ai*y[i-1] + bi*y[i+1] - y[i]
sigmaHat2 = (1.0/(n-2))*np.dot(ci2.T, eTilde**2)
print('GSJ: ', sigmaHat2)
return sigmaHat2
np.random.seed(1231)
# generate sin wave data
x = np.linspace(0, 10, num=100)
y = np.sin(x * np.pi / 2)
# add noise to the data
y = np.random.normal(0, 0.05, 100) + y
data = np.vstack((x, y)).T
sigma_hat = calcGSJ(data)
# initialize piecewise linear fit with your x and y data
my_pwlf = pwlf.PiecewiseLinFit(x, y)
keep_going = True
n_segment = 2
results = {}
while keep_going:
_ = my_pwlf.fitfast(n_segment, pop=10)
nb = my_pwlf.beta.size + my_pwlf.fit_breaks.size - 2
sigma = my_pwlf.ssr / (my_pwlf.n_data - nb)
F = sigma_hat / sigma
results[n_segment] = F
if F > 1.0:
keep_going = False
print(f'best result found with n_segment={n_segment - 1}')
n_segment -= 1
else:
n_segment += 1
# perform the fit again
_ = my_pwlf.fitfast(n_segment, pop=10)
# predict for the determined points
xHat = np.linspace(min(x), max(x), num=10000)
yHat = my_pwlf.predict(xHat)
# plot the results
plt.figure()
plt.plot(x, y, 'o')
plt.plot(xHat, yHat, '-', label=f'pwlf {n_segment}')
_ = my_pwlf.fitfast(n_segment+1, pop=10)
yHat = my_pwlf.predict(xHat)
plt.plot(xHat, yHat, '-', label=f'pwlf {n_segment+1}')
plt.legend()
plt.show()
I would love to hear if this works better, or worse in your application. There could be better ways to do the search for the magic ratio? (binary search, golden ratio, quadratic fits? )
from piecewise_linear_fit_py.
A veeery simplistic approach to this problem is to fit more and more lines to the data until adding more lines does not
give any meaningful improvement. Maybe not the fastest approach but works well enough for my application.
x = series.index.to_numpy()
y = series.to_numpy()
my_pwlf = pwlf.PiecewiseLinFit(x, y)
max_line_count = 10
min_percent_improvement_to_add_line = 80
last_ssr = None
best_pwlf = None
for i in range(2, max_line_count):
#res = my_pwlf.fit(i)
res = my_pwlf.fitfast(i, pop=3) # runs faster than fit but does not guarantee optimal solution
ssr = my_pwlf.ssr
if last_ssr is not None:
improvement_percent = ( last_ssr - ssr )/ last_ssr * 100
if improvement_percent < min_percent_improvement_to_add_line:
break
last_ssr = ssr
best_pwlf = my_pwlf
print(f"best line_count: {i-1}")
xhat = np.linspace(x.min(), x.max(), len(x))
yhat = best_pwlf.predict(xhat)
from piecewise_linear_fit_py.
@thunderbug1 Thanks for sharing this!
One thing to note as the number of line segments approaches the number of data points, SSR will approach zero.
from piecewise_linear_fit_py.
@cjekel Thats very true, from what I have seen so far, the code stops at a reasonable number of lines. One could even add a dynamically set max_line_count if the signal length/type varies to avoid overfitting in edge cases.
from piecewise_linear_fit_py.
Related Issues (20)
- .fit() fails with 1 segment HOT 5
- How to force the fit process to have a fixed Intercept? HOT 1
- Can I get y_values if I have only x and slopes values? HOT 1
- Limit the slope of each segment of the curve HOT 1
- Re-constructing Piecewise PWLF HOT 2
- Set Slope of Segment to 0 HOT 2
- How to fit multiple functions simultaneously HOT 3
- Hi, i want to make sure that there are no fitted lines between points that are too far apart i.e. set a min value( fragment optimization) how to achieve this? HOT 1
- Why last beta is always positive? HOT 3
- How to prevent poor fitting HOT 4
- Create a pwlf using custom coefficient HOT 1
- p values does not seem accurate HOT 3
- Error for coefficients of linear equations HOT 7
- pwlf with unknown line segments HOT 9
- assure the slopes to be lower and lower HOT 1
- divide by zero error in calc slopes if two break points are the same, or if a breakpoint is on the boundary HOT 2
- Issue using .fit() HOT 5
- How to plot segments with fit_breaks information HOT 4
- support random seed on init
- How to calculate prediction intervals? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from piecewise_linear_fit_py.