boulder-investment-technologies / lppls Goto Github PK
View Code? Open in Web Editor NEWLibrary for fitting the LPPLS model to data.
License: MIT License
Library for fitting the LPPLS model to data.
License: MIT License
From collaborator: H.C. Shih:
I think this may be a little error in your damping rate code
(mabs(b))/(wabs(c1+c2))
you assume c1+c2=c here
but
c1=ccos and c2=csin
sin^2 + cos^2 = 1
Thus
(csin)^2 + (ccos)^2 = c^2 = c1^2 + c2^2
c = (c1^2 + c2^2)^(1/2)
The following error is encountered from time to time, but does not seem to affect the computation of indicators
lib/python3.10/site-packages/lppls/lppls.py:640: RuntimeWarning: divide by zero encountered in double_scalars
return (m * np.abs(b)) / (w * np.abs(c))
Possibly create a function for Confidence Interval Data Storage?
tested=pd.DataFrame()
count=0
for i in data[data.trade_date>'20070801'].trade_date.values:
test_data=data[data.trade_date<=i]
time = np.linspace(0, len(test_data)-1, len(test_data))
# create list of observation data, in this case,
# daily adjusted close prices of the S&P 500
price = [p for p in test_data['close']]
# create Mx2 matrix (expected format for LPPLS observations)
observations = np.array([time, price])
# the literature suggests 25
MAX_SEARCHES = 10000
# instantiate a new LPPLS model with the S&P 500 dataset
lppls_model = lppls.LPPLS(use_ln=True, observations=observations)
# fit the model to the data and get back the params
tc, m, w, a, b, c = lppls_model.fit(observations, MAX_SEARCHES, minimizer='Nelder-Mead')
# visualize the fit
#lppls_model.plot_fit(observations, tc, m, w)
count+=1
print('calculating {}th data in {}datas'.format(count,len(data[data.trade_date>'20070801'].trade_date.values)))
tested=tested.append(pd.DataFrame({'date':i,'critical_time':tc,'critical_time2':tc-len(time),'m':m,'w':w,'a':a,'b':b,'c':c},index=[i]))
Why do you have to pass the observation data to the model when you init it and when you run fit
? Would be nice to only do it once.
I'm trying to apply the same conditions of this article:
Demirer, Riza & Demos, Guilherme & Gupta, Rangan & Sornette, Didier. (2017). On the Predictability of Stock Market Bubbles: Evidence from LPPLS Confidence TM Multi-scale Indicators. SSRN Electronic Journal. 10.2139/ssrn.3076609.
filter_conditions_config = [
{'condition_1':[
(-0.05, 0.1), # tc_range
(0.01,1.2), # m_range
(6,13), # w_range
2.5, # O_min
0.8, # D_min
]},
{'condition_2':[
(-0.05, 0.1), # tc_range
(0.01,0.99), # m_range
(6,13), # w_range
2.5, # O_min
1.0, # D_min
]}
]
Am I setting the conditions correctly?
I would appreciate any help. Thank you!
Hi I tried to run this:
define custom filter condition
filter_conditions_config = [
{'condition_1':[
(0.0, 0.1), # tc_range
(0,1), # m_range
(4,25), # w_range
2.5, # O_min
0.5, # D_min
]},
]
compute the confidence indicator
res = lppls_model.mp_compute_indicator(
workers=32,
window_size=120,
smallest_window_size=30,
increment=5,
max_searches=25,
filter_conditions_config=filter_conditions_config
)
res_df = self.res_to_df(res, 'condition_1')
lppls_model.plot_confidence_indicators(res_df, title='Short Term Indicator 30-120')
And got this error:
NameError Traceback (most recent call last)
in ()
19 filter_conditions_config=filter_conditions_config
20 )
---> 21 res_df = self.res_to_df(res, 'condition_1')
22 lppls_model.plot_confidence_indicators(self, res_df, title='Short Term Indicator 30-120')
23
NameError: name 'self' is not defined
Hi Josh,
I was reading articles on your site where you said that you run LPPL backtest for x20 hours on x96 vCPUs.
I have an NVIDIA CUDA card and was wondering whether your LPPL Python code would be helped by CUDA cards? A NVIDIA CUDA card can easily have x1,000 or more processors, but these processors are limited in what they can do.
Looking forward to hearing your thoughts.
regards
dejan
I think I could squeeze some speed in by applying branchless programming. Particularly in the indicator function. Is our primary objective speed? It would come at the cost of some readability.
Currently we only platform test for ubuntu-latest
. Could we expand the yml to include windows-latest
?
Here is my example code:
data = data_loader.nasdaq_dotcom()
time = [pd.Timestamp.toordinal(dt.strptime(t1, '%Y-%m-%d')) for t1 in data['Date']]
price = np.log(data['Adj Close'].values)
observations = np.array([time, price])
MAX_SEARCHES = 25
lppls_model = lppls.LPPLS(observations=observations)
tc, m, w, a, b, c, c1, c2, O, D = lppls_model.fit(MAX_SEARCHES)
res = lppls_model.mp_compute_nested_fits(
workers=8,
window_size=120,
smallest_window_size=30,
outer_increment=1,
inner_increment=5,
max_searches=25,
filter_conditions_config={} # not implemented in 0.6.x
)
Here is the error, this is the error that continues in a loop:
Traceback (most recent call last):
File "", line 1, in
File "C:\Users\kylee\AppData\Local\Programs\Python\Python39\lib\multiprocessing\spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "C:\Users\kylee\AppData\Local\Programs\Python\Python39\lib\multiprocessing\spawn.py", line 125, in _main
prepare(preparation_data)
File "C:\Users\kylee\AppData\Local\Programs\Python\Python39\lib\multiprocessing\spawn.py", line 236, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "C:\Users\kylee\AppData\Local\Programs\Python\Python39\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path
main_content = runpy.run_path(main_path,
File "C:\Users\kylee\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 268, in run_path
return _run_module_code(code, init_globals, run_name,
File "C:\Users\kylee\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 97, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "C:\Users\kylee\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "C:\Users\kylee\Documents\Stock Science\Portfolio Analysis\main.py", line 392, in
res = lppls_model.mp_compute_nested_fits(
File "C:\Users\kylee\Documents\Stock Science\Portfolio Analysis\venv\lib\site-packages\lppls\lppls.py", line 426, in mp_compute_nested_fits
with Pool(processes=workers) as pool:
File "C:\Users\kylee\AppData\Local\Programs\Python\Python39\lib\multiprocessing\context.py", line 119, in Pool
return Pool(processes, initializer, initargs, maxtasksperchild,
File "C:\Users\kylee\AppData\Local\Programs\Python\Python39\lib\multiprocessing\pool.py", line 212, in init
self._repopulate_pool()
File "C:\Users\kylee\AppData\Local\Programs\Python\Python39\lib\multiprocessing\pool.py", line 303, in _repopulate_pool
return self._repopulate_pool_static(self._ctx, self.Process,
File "C:\Users\kylee\AppData\Local\Programs\Python\Python39\lib\multiprocessing\pool.py", line 326, in _repopulate_pool_static
w.start()
File "C:\Users\kylee\AppData\Local\Programs\Python\Python39\lib\multiprocessing\process.py", line 121, in start
self._popen = self._Popen(self)
File "C:\Users\kylee\AppData\Local\Programs\Python\Python39\lib\multiprocessing\context.py", line 327, in _Popen
return Popen(process_obj)
File "C:\Users\kylee\AppData\Local\Programs\Python\Python39\lib\multiprocessing\popen_spawn_win32.py", line 45, in init
prep_data = spawn.get_preparation_data(process_obj._name)
File "C:\Users\kylee\AppData\Local\Programs\Python\Python39\lib\multiprocessing\spawn.py", line 154, in get_preparation_data
_check_not_importing_main()
File "C:\Users\kylee\AppData\Local\Programs\Python\Python39\lib\multiprocessing\spawn.py", line 134, in _check_not_importing_main
raise RuntimeError('''
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.
Have you ever come across this?
Thanks for any feedback
Thanks for your project. I suggest adding the topic finance
in the About section at https://github.com/Boulder-Investment-Technologies/lppls.
Dear developer
When I try to run your code, it shows the below error message:
TypeError Traceback (most recent call last)
in ()
30 # fit the model to the data and get back the params
31
---> 32 tc, m, w, a, b, c, c1, c2, O, D = lppls_model.fit(MAX_SEARCHES)
33
34 # visualize the fit
TypeError: fit() missing 1 required positional argument: 'max_searches'
Can you help to solve it? Tks!
with regards,
Peter
The normal LPPL model fits and predicts future market trends by capturing the cyclicity and acceleration in price trends.
A reversed LPPL model may be also useful in describing the cyclicity and deceleration in price trends.
it could be achieved by changing m to -m and allowing a negative value of -mīŧ and change (tc-t) to (t - tc).
y_2 = a + np.power(t - tc, -m) * (b + ((c1 * np.cos(w * np.log(t - tc))) + (c2 * np.sin(w * np.log(t - tc)))))
hi,
I am follwing exactly the same scripts in Example Use:
`from lppls import lppls, data_loader
import numpy as np
import pandas as pd
from datetime import datetime as dt
%matplotlib inline
data = data_loader.nasdaq_dotcom()
time = [pd.Timestamp.toordinal(dt.strptime(t1, '%Y-%m-%d')) for t1 in data['Date']]
price = np.log(data['Adj Close'].values)
observations = np.array([time, price])
MAX_SEARCHES = 25
lppls_model = lppls.LPPLS(observations=observations)
tc, m, w, a, b, c, c1, c2, O, D = lppls_model.fit(MAX_SEARCHES)
lppls_model.plot_fit()
`
instead of the chart shown, I got an error message:
TypeError Traceback (most recent call last)
in
22
23 # visualize the fit
---> 24 lppls_model.plot_fit()
25
26 # should give a plot like the following...
~\Anaconda3\lib\site-packages\lppls\lppls.py in plot_fit(self, show_tc)
219 # fontsize=16)
220
--> 221 ax1.plot(time_ord, price, label='price', color='black', linewidth=0.75)
222 ax1.plot(time_ord, lppls_fit, label='lppls fit', color='blue', alpha=0.5)
223 # if show_tc:
~\Anaconda3\lib\site-packages\matplotlib\axes_axes.py in plot(self, scalex, scaley, data, *args, **kwargs)
1665 lines = [*self._get_lines(*args, data=data, **kwargs)]
1666 for line in lines:
-> 1667 self.add_line(line)
1668 self.autoscale_view(scalex=scalex, scaley=scaley)
1669 return lines
~\Anaconda3\lib\site-packages\matplotlib\axes_base.py in add_line(self, line)
1900 line.set_clip_path(self.patch)
1901
-> 1902 self._update_line_limits(line)
1903 if not line.get_label():
1904 line.set_label('_line%d' % len(self.lines))
~\Anaconda3\lib\site-packages\matplotlib\axes_base.py in _update_line_limits(self, line)
1922 Figures out the data limit of the given line, updating self.dataLim.
1923 """
-> 1924 path = line.get_path()
1925 if path.vertices.size == 0:
1926 return
~\Anaconda3\lib\site-packages\matplotlib\lines.py in get_path(self)
1025 """
1026 if self._invalidy or self._invalidx:
-> 1027 self.recache()
1028 return self._path
1029
~\Anaconda3\lib\site-packages\matplotlib\lines.py in recache(self, always)
668 if always or self._invalidx:
669 xconv = self.convert_xunits(self._xorig)
--> 670 x = _to_unmasked_float_array(xconv).ravel()
671 else:
672 x = self._x
~\Anaconda3\lib\site-packages\matplotlib\cbook_init_.py in _to_unmasked_float_array(x)
1388 return np.ma.asarray(x, float).filled(np.nan)
1389 else:
-> 1390 return np.asarray(x, float)
1391
1392
~\Anaconda3\lib\site-packages\numpy\core_asarray.py in asarray(a, dtype, order)
83
84 """
---> 85 return array(a, dtype, copy=False, order=order)
86
87
TypeError: float() argument must be a string or a number, not 'Timestamp'
could you please help to check?
Thanks
Hello, I just discovered this wonderful thing. But I am wondering how to optimize so many parameters in short time intervals. Could you help me about this ?
At the moment fit
does not incorporate filtering logic, which may lead to bogus results that depending on the conditions may also vary significantly based on the chosen random.seed
, whilst mp_compute_nested_fits
has a hardcoded filtering logic for m
, w
, damping
and oscillations
but not for error
.
It would be great to be able to define search space and filtering conditions both on the fit
and mp_compute_nested_fits
methods in such a way that it is possible to encode the full set described in Sornette et al. 2015 via a configuration option.
Consider adding a version that uses Tensorflow minimiser as described here
https://www.tensorflow.org/probability/api_docs/python/tfp/optimizer/nelder_mead_minimize
this hopefully will allow to run the minimizer on GPUs as well.
from lppls import lppls, data_loader
import numpy as np
import pandas as pd
from datetime import datetime as dt
%matplotlib inline
data = data_loader.nasdaq_dotcom()
time = [pd.Timestamp.toordinal(dt.strptime(t1, '%Y-%m-%d')) for t1 in data['Date']]
price = np.log(data['Adj Close'].values)
observations = np.array([time, price])
from lppls import lppls_cmaes
lppls_model = lppls_cmaes.LPPLSCMAES(observations=observations)
tc, m, w, a, b, c, c1, c2, O, D = lppls_model.fit(max_iteration=2500, pop_size=4)
res = lppls_model.mp_compute_nested_fits(
workers=8,
window_size=120,
smallest_window_size=30,
outer_increment=1,
inner_increment=5,
max_searches=25,
)
lppls_model.plot_confidence_indicators(res)
(80_w,160)-aCMA-ES (mu_w=42.4,w_1=5%) in dimension 3 (seed=470772, Wed Jan 5 15:43:08 2022)
Iterat #Fevals function value axis ratio sigma min&max std t[m:s]
1 160 9.527592475130895e-01 1.0e+00 1.02e+00 7e-02 4e+04 0:00.1
2 320 9.456107662596909e-01 2.0e+00 1.09e+00 8e-02 3e+04 0:00.1
3 480 9.137850385070987e-01 2.9e+00 1.09e+00 8e-02 2e+04 0:00.2
Array must not contain infs or NaNs.
/opt/anaconda3/lib/python3.9/site-packages/scipy/stats/stats.py:6685: RuntimeWarning: divide by zero encountered in
double_scalars
relative_diff = (np.abs(f_obs_sum - f_exp_sum) /
ValueError Traceback (most recent call last)
/var/folders/s2/86r_kl2925q1brkhmg1nmfr40000gn/T/ipykernel_25457/2648792023.py in
19 lppls_model = lppls_cmaes.LPPLSCMAES(observations=observations)
20
---> 21 tc, m, w, a, b, c, c1, c2, O, D = lppls_model.fit(max_iteration=2500, pop_size=4)
22
23 # visualize the fit
/opt/anaconda3/lib/python3.9/site-packages/lppls/lppls_cmaes.py in fit(self, max_iteration, factor_sigma, pop_size, obs)
80 while not es.stop() and es.countiter <= max_iteration:
81 solutions = es.ask()
---> 82 solution = [self.fun_restricted(s, obs) for s in solutions]
83 es.tell(solutions, solution)
84 es.logger.add() # write data to disc to be plotted
/opt/anaconda3/lib/python3.9/site-packages/lppls/lppls_cmaes.py in (.0)
80 while not es.stop() and es.countiter <= max_iteration:
81 solutions = es.ask()
---> 82 solution = [self.fun_restricted(s, obs) for s in solutions]
83 es.tell(solutions, solution)
84 es.logger.add() # write data to disc to be plotted
/opt/anaconda3/lib/python3.9/site-packages/lppls/lppls_cmaes.py in fun_restricted(self, x, obs)
41
42 # calculate the chi square
---> 43 error, _ = chisquare(f_obs=res, f_exp=obs[1, :])
44 return error
45
/opt/anaconda3/lib/python3.9/site-packages/scipy/stats/stats.py in chisquare(f_obs, f_exp, ddof, axis)
6850
6851 """
-> 6852 return power_divergence(f_obs, f_exp=f_exp, ddof=ddof, axis=axis,
6853 lambda_="pearson")
6854
/opt/anaconda3/lib/python3.9/site-packages/scipy/stats/stats.py in power_divergence(f_obs, f_exp, ddof, axis, lambda_)
6692 f"of {rtol}, but the percent differences are:\n"
6693 f"{relative_diff}")
-> 6694 raise ValueError(msg)
6695
6696 else:
ValueError: For each axis slice, the sum of the observed frequencies must agree with the sum of the expected frequencies to a relative tolerance of 1e-08, but the percent differences are:
inf
Hi -
Thanks for posting !
Running your example I ran into trouble when you call the function
res_df = res_to_df(res, 'condition_1')
... as the function res_to_df is not recognized, rather res_df = lppls_model.res_to_df(res, 'condition_1') seems to work fine.
Thanks, Jesper.
19 filter_conditions_config=filter_conditions_config
20 )
---> 21 res_df = res_to_df(res, 'condition_1')
22 lppls_model.plot_confidence_indicators(res_df, title='Short Term Indicator 120-30')
23
NameError: name 'res_to_df' is not defined
May I use another data instead of the data in nasdaq_dotcom.csv?
Traceback (most recent call last):
File "C:\Users\bao25\Desktop\lppls-master\use.py", line 14, in
time = [pd.Timestamp.toordinal(dt.strptime(t1, '%Y-%m-%d')) for t1 in data['Date']]
File "C:\Users\bao25\Desktop\lppls-master\use.py", line 14, in
time = [pd.Timestamp.toordinal(dt.strptime(t1, '%Y-%m-%d')) for t1 in data['Date']]
File "C:\Users\bao25\anaconda3\lib_strptime.py", line 568, in _strptime_datetime
tt, fraction, gmtoff_fraction = _strptime(data_string, format)
File "C:\Users\bao25\anaconda3\lib_strptime.py", line 349, in _strptime
raise ValueError("time data %r does not match format %r" %
ValueError: time data '1994/1/3' does not match format '%Y-%m-%d'
I tried to extend the fitted line into future days.
It would be useful to update the lppls_model.plot_fit() function.
# extend
new_dates = pd.date_range(start=data.index[-1] + pd.Timedelta(days=1), periods=500)
t = [pd.Timestamp.toordinal(timestamp) for timestamp in new_dates]
y = a + np.power(tc - t, m) * (b + ((c1 * np.cos(w * np.log(tc - t))) + (c2 * np.sin(w * np.log(tc - t)))))
plt.plot(new_dates, y)
plt.show()
I find this interesting.
Despite the procedure being non-deterministic, it would be great to see some unittests added. Is this something you would like help in?
Could we run tests on 'push' instead of master? That would allow forks to be checked against the CI build.
In my latest pull request, I see that the tests failed because my tests for failed convergence raised a ValueError (instead of a numpy.linalg.LinAlgError) in ubuntu-latest.
Since I'm using windows, it's tricky to anticipate some of those platform differences without additionally testing on the CI build.
The double moving windows implementation is unfit to most production implementation due to the fact that the latest data point is unexplored by the window most of the time (at the time of indicator computation).
func_arg_map = [( obs_copy[:, i:window_size + i], window_size, i, smallest_window_size, outer_increment, inner_increment, max_searches, ) for i in range(0, obs_opy_len+1, outer_increment)]
I suggest adding the option to start the iteration at (obs_opy_len % outer_increment)
to make it usable in monitoring or backtest applications without branching the project.
Playing with the S&P 500 time series for the past recent month, trying to compute the bubble confidence indicator. I noted that neither a bubble early warning nor a bubble end flag would be raised, when running _func_compute_indicator with window_size=125 and smallest_window_size=20.
The filtering conditions in _func_compute_indicator appear too strict. In particular the condition
n_oscillation = ((w / 2) * np.log(abs((tc - first) / (last - first)))) > 2.5
almost never evaluates to true.
I wonder whether you did a similar analysis and noticed the same?
I am aware that these are the conditions cited in all reference papers. from Sornette et. al.; However, I found much more reasonable results when using the filtering conditions for the confidence indicator as implemented in the R bubble package by Dean Fantazzini (https://github.com/deanfantazzini/bubble/blob/master/R/LPPL_confidence.R), which is:
n_oscillation = w /( 2 * math.pi) * np.log(abs(tc / (tc - last))) > 2.5
Using this much less strict condition, I end up with the indicator values plotted in the attached file:
Hey there, I'm trying to run this code but with a btc data set, however, just running it out of the box I am faced with the error that data loader variable is not defined.
lppls.py:57: RuntimeWarning: invalid value encountered in log
phase = np.log(deltaT) if self.use_ln else deltaT
lppls.py:58: RuntimeWarning: invalid value encountered in power
fi = np.power(deltaT, m)
minimize failed: SVD did not converge in Linear Least Squares
minimize failed: SVD did not converge in Linear Least Squares
minimize failed: SVD did not converge in Linear Least Squares
Hi
As I mentioned in the title, I'd like to put some price minute data. But it seems not to work when I just put them in the module.
If I can use this module to analyze minute data, could you tell me how to do it.
Thank you for your effort and post. It's so cool module and always helpful.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
đ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. đđđ
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google â¤ī¸ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.