It looks like this library requires a user to specify the number of line segments befo

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Number of Line Segments Needs to Be Specified about piecewise_linear_fit_py HOT 10 OPEN

cjekel commented on May 23, 2024

Number of Line Segments Needs to Be Specified

from piecewise_linear_fit_py.

Comments (10)

cjekel commented on May 23, 2024

While this is not the focus of this library, it is possible to use pwlf to find an optimum number of line segments.

The problem with automatically detecting the number of line segments is that the objective function is not as well defined. Many questions will need to be answered. Do you introduce a regularizer? Do you introduce a penalty? If you do not have a lot of noise, it's likely that without some form of regularization or penalty function that you may have as many line segments as you have data points. This no longer becomes simply minimizing a L2 norm.

Do you have any suggestions on how to find the optimum number of line segments?

Let me give you an example of how you could use pwlf to find the best number of line segments. This example just uses a simple penalty parameter to penalize the model complexity, or the number of line segments. Essentially we want to find x such that my_obj is minimized. I then go on to minimize this function using bayesian optimization (or EGO). To see how this is done, check out the newly added example.

# define your objective function
def my_obj(x):
    # define some penalty parameter l
    # you'll have to arbitrarily pick this
    # it depends upon the noise in your data,
    # and the value of your sum of square of residuals
    l = y.mean()*0.001
    f = np.zeros(x.shape[0])
    for i, j in enumerate(x):
        my_pwlf.fit(j[0])
        f[i] = my_pwlf.ssr + (l*j[0])
    return f

from piecewise_linear_fit_py.

cjekel commented on May 23, 2024

Would it make sense in your application that breakpoint locations can only occur at a data point? That could greatly reduce the time of the inner optimization.

You could also try using the fitfast() function instead of the fit() function within my_obj.

from piecewise_linear_fit_py.

cjekel commented on May 23, 2024

I want to open this up, because depending upon your problem their may be easier ways to deal with finding the best number of line segments.

If you don't have noise in your data, it may be possible to detect the number of line segments from a simple least squares fit. I try to show that in the first example here (this requires the 0.4.0 version in development)

from piecewise_linear_fit_py.

hyperactivetimes commented on May 23, 2024

While this is not the focus of this library, it is possible to use pwlf to find an optimum number of line segments.

The problem with automatically detecting the number of line segments is that the objective function is not as well defined. Many questions will need to be answered. Do you introduce a regularizer? Do you introduce a penalty? If you do not have a lot of noise, it's likely that without some form of regularization or penalty function that you may have as many line segments as you have data points. This no longer becomes simply minimizing a L2 norm.

Do you have any suggestions on how to find the optimum number of line segments?

Let me give you an example of how you could use pwlf to find the best number of line segments. This example just uses a simple penalty parameter to penalize the model complexity, or the number of line segments. Essentially we want to find x such that my_obj is minimized. I then go on to minimize this function using bayesian optimization (or EGO). To see how this is done, check out the newly added example.
# define your objective function
def my_obj(x):
    # define some penalty parameter l
    # you'll have to arbitrarily pick this
    # it depends upon the noise in your data,
    # and the value of your sum of square of residuals
    l = y.mean()*0.001
    f = np.zeros(x.shape[0])
    for i, j in enumerate(x):
        my_pwlf.fit(j[0])
        f[i] = my_pwlf.ssr + (l*j[0])
    return f

hello there!
i'm currently working on a problem which requires using this package. however, determining the optimal number of line segments in advance is a must. so as i was searching the net i stumbled upon this piece of code in which the underlying idea i assume to be a tradeoff between "ssr" and the "number of line segments". however "j" is a scalar and can not be indexed so i got an error. then i was wondering if i could replace the "j[0]" with "i" or not?
on top of that for i = 1 i receive another error.

from piecewise_linear_fit_py.

cjekel commented on May 23, 2024

The short answer is that x in the above case was expected to be a 2d array.

just do

def my_obj(x):
    # x is integer, returns float
    l = y.mean()*0.001
    my_pwlf.fit(x)
    f = my_pwlf.ssr + (l*x)
    return f

For a longer answer. I've been working on a better way to do this.

How many data points do you have in your application? What is the upper limit of line segments you are searching for?

from piecewise_linear_fit_py.

hyperactivetimes commented on May 23, 2024

The short answer is that x in the above case was expected to be a 2d array.

just do
def my_obj(x):
    # x is integer, returns float
    l = y.mean()*0.001
    my_pwlf.fit(x)
    f = my_pwlf.ssr + (l*x)
    return f
For a longer answer. I've been working on a better way to do this.

How many data points do you have in your application? What is the upper limit of line segments you are searching for?

thanks for the fast response.
the short version solved the issue.
i have 25 data points sampled from "a/x" curve in which "a" is the constant and "x" is the variable. the less line segments i have the better it will be for my following intentions.

from piecewise_linear_fit_py.

cjekel commented on May 23, 2024

@hyperactivetimes Another approach I have been playing around with, that would work well in your case is to compare the model's prediction variance, and compare this to an approximate variance from the data. Ideally this ratio will be 1.0.

This is a proof of concept, that manually tries 2 segments, then 3, then 4, until we've broken the ratio.

import numpy as np
import matplotlib.pyplot as plt
import pwlf


def calcGSJ(data):
    """
    Estimate the variance in data u sing the GSJ method
    """
    x = data[:, 0]
    y = data[:, 1]
    n = len(x)
    eTilde = np.zeros(len(x)-2)
    ci2 = np.zeros(len(x)-2)
    #    GSJ method
    for i in range(1, n-1):
        di = x[i+1]-x[i]
        if di == 0:
            ai = 0.5
        else:
            ai = (x[i+1] - x[i]) / di
        bi = 1.0 - ai
        ci2[i-1] = 1.0 / (ai**2 + bi**2 + 1.0)
        eTilde[i-1] = ai*y[i-1] + bi*y[i+1] - y[i]        
    sigmaHat2 = (1.0/(n-2))*np.dot(ci2.T, eTilde**2)
    print('GSJ: ', sigmaHat2)
    return sigmaHat2


np.random.seed(1231)
# generate sin wave data
x = np.linspace(0, 10, num=100)
y = np.sin(x * np.pi / 2)
# add noise to the data
y = np.random.normal(0, 0.05, 100) + y

data = np.vstack((x, y)).T

sigma_hat = calcGSJ(data)

# initialize piecewise linear fit with your x and y data
my_pwlf = pwlf.PiecewiseLinFit(x, y)

keep_going = True
n_segment = 2
results = {}
while keep_going:
    _ = my_pwlf.fitfast(n_segment, pop=10)
    nb = my_pwlf.beta.size + my_pwlf.fit_breaks.size - 2
    sigma = my_pwlf.ssr / (my_pwlf.n_data - nb)
    F = sigma_hat / sigma
    results[n_segment] = F
    if F > 1.0:
        keep_going = False
        print(f'best result found with n_segment={n_segment - 1}')
        n_segment -= 1
    else:
        n_segment += 1

# perform the fit again
_ = my_pwlf.fitfast(n_segment, pop=10)
# predict for the determined points
xHat = np.linspace(min(x), max(x), num=10000)
yHat = my_pwlf.predict(xHat)

# plot the results
plt.figure()
plt.plot(x, y, 'o')
plt.plot(xHat, yHat, '-', label=f'pwlf {n_segment}')
_ = my_pwlf.fitfast(n_segment+1, pop=10)
yHat = my_pwlf.predict(xHat)
plt.plot(xHat, yHat, '-', label=f'pwlf {n_segment+1}')
plt.legend()
plt.show()

I would love to hear if this works better, or worse in your application. There could be better ways to do the search for the magic ratio? (binary search, golden ratio, quadratic fits? )

from piecewise_linear_fit_py.

thunderbug1 commented on May 23, 2024

A veeery simplistic approach to this problem is to fit more and more lines to the data until adding more lines does not
give any meaningful improvement. Maybe not the fastest approach but works well enough for my application.

x = series.index.to_numpy()
y = series.to_numpy()
my_pwlf = pwlf.PiecewiseLinFit(x, y)
max_line_count = 10
min_percent_improvement_to_add_line = 80

last_ssr = None
best_pwlf = None
for i in range(2, max_line_count):
    #res = my_pwlf.fit(i)
    res = my_pwlf.fitfast(i, pop=3) # runs faster than fit but does not guarantee optimal solution
    ssr = my_pwlf.ssr
    if last_ssr is not None:
        improvement_percent = ( last_ssr - ssr )/ last_ssr * 100
        if improvement_percent < min_percent_improvement_to_add_line:
            break
    last_ssr = ssr
    best_pwlf = my_pwlf
print(f"best line_count: {i-1}")
xhat = np.linspace(x.min(), x.max(), len(x))
yhat = best_pwlf.predict(xhat)

from piecewise_linear_fit_py.

cjekel commented on May 23, 2024

@thunderbug1 Thanks for sharing this!

One thing to note as the number of line segments approaches the number of data points, SSR will approach zero.

from piecewise_linear_fit_py.

thunderbug1 commented on May 23, 2024

@cjekel Thats very true, from what I have seen so far, the code stops at a reasonable number of lines. One could even add a dynamically set max_line_count if the signal length/type varies to avoid overfitting in edge cases.

from piecewise_linear_fit_py.

Number of Line Segments Needs to Be Specified about piecewise_linear_fit_py HOT 10 OPEN

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent