danielbok / copulae Goto Github PK
View Code? Open in Web Editor NEWMultivariate data modelling with Copulas in Python
Home Page: https://copulae.readthedocs.io/en/latest/
License: MIT License
Multivariate data modelling with Copulas in Python
Home Page: https://copulae.readthedocs.io/en/latest/
License: MIT License
ImportError Traceback (most recent call last)
in
----> 1 import copulae
~/anaconda3/lib/python3.7/site-packages/copulae/init.py in
1 from copulae._version import get_versions
----> 2 from copulae.archimedean import *
3 from copulae.elliptical import *
4 from copulae.empirical import *
5 from copulae.indep import *
~/anaconda3/lib/python3.7/site-packages/copulae/archimedean/init.py in
----> 1 from .clayton import ClaytonCopula
2 from .frank import FrankCopula
3 from .gumbel import GumbelCopula
4
5 all = ['ClaytonCopula', 'FrankCopula', 'GumbelCopula']
~/anaconda3/lib/python3.7/site-packages/copulae/archimedean/clayton.py in
2 import numpy.random as rng
3
----> 4 from copulae.copula import Summary, TailDep
5 from copulae.core import EPS, valid_rows_in_u
6 from copulae.stats import random_uniform
~/anaconda3/lib/python3.7/site-packages/copulae/copula/init.py in
1 from .abstract import AbstractCopula
----> 2 from .base import BaseCopula, TailDep
3 from .summary import Summary
~/anaconda3/lib/python3.7/site-packages/copulae/copula/base.py in
5
6 from copulae.copula.abstract import AbstractCopula
----> 7 from copulae.copula.estimator import CopulaEstimator
8 from copulae.core import pseudo_obs
9 from copulae.types import Array, Numeric
~/anaconda3/lib/python3.7/site-packages/copulae/copula/estimator/init.py in
----> 1 from .estimator import fit_copula, EstimationMethod
~/anaconda3/lib/python3.7/site-packages/copulae/copula/estimator/estimator.py in
6 from copulae.core import tri_indices
7 from copulae.stats import pearson_rho
----> 8 from copulae.types import Array, Ties
9 from copulae.utility.dict import merge_dict
10 from .corr_inversion import estimate_corr_inverse_params
ImportError: cannot import name 'Ties' from 'copulae.types' (/home/zakariae/anaconda3/lib/python3.7/site-packages/copulae/types.py)
Hey @DanielBok thank for you for making this great package for us py users.
Ok managed to get it to work perfectly with genpareto, didn't realize it was flexible to work with any distribution, until I dove in the code... if you need help with a certain aspect of it... let me know... I would be happy to contribute π
Usage example :
from copulae import GumbelCopula
cop = GumbelCopula(dim=2)
cop.fit(SOME_DATA)
cop.plot_cdf()
cop.plot_pdf()
Hello, based on my understanding of copulas and experience using them in other languages, I believe the input data must be uniformly distributed before it can be fit to a copula. This is specified in definition 1, point 2 here. This is appears to be consistent with your example, where you fit a gaussian copula to data and state that the function will internally convert the data to psuedo observations here. I have a few questions:
Thank you!
According to you documentation of this library when you do t_cop.sumary()
(in this case for a student copula) it should also return the fit summary of it, however and as we can see in the code it just returns the Degree of Freedom and Correlation Matrix (normal summary) and not the other information it should return according to the documentatio (see attached image). Is there any intention of completing this or was that part in the documentation a mistake. Thanks for the help.
Hello,
I am trying to use the EmpiricalCopula class and I am having some difficulties.
Example code :
from copulae import EmpiricalCopula
import numpy as np
np.random.seed(8)
data = np.random.uniform(size=(3000, 3))
cop = EmpiricalCopula(dim=3, data=data)
print(cop.random(10)) # simulate random number
print(cop.cdf(np.array([[0.1, 0.5, 1]])))
I find that the cop.cdf always gives the same number no matter what I use in the input. Am I using it wrong?
Thanks for writing this package.
Best,
Got the following error trying to install copulae
:
$ pip wheel --use-pep517 "copulae (==0.7.7)" [14:42:45]
Collecting copulae==0.7.7
Using cached copulae-0.7.7.tar.gz (660 kB)
Installing build dependencies ... done
Getting requirements to build wheel ... error
error: subprocess-exited-with-error
Γ Getting requirements to build wheel did not run successfully.
β exit code: 1
β°β> [21 lines of output]
Traceback (most recent call last):
File "/root/venv/lib/python3.11/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 351, in <module>
main()
File "/root/venv/lib/python3.11/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 333, in main
json_out['return_val'] = hook(**hook_input['kwargs'])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/venv/lib/python3.11/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 118, in get_requires_for_build_wheel
return hook(config_settings)
^^^^^^^^^^^^^^^^^^^^^
File "/tmp/pip-build-env-hde_99ze/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 341, in get_requires_for_build_wheel
return self._get_build_requires(config_settings, requirements=['wheel'])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/tmp/pip-build-env-hde_99ze/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 323, in _get_build_requires
self.run_setup()
File "/tmp/pip-build-env-hde_99ze/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 488, in run_setup
self).run_setup(setup_script=setup_script)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/tmp/pip-build-env-hde_99ze/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 338, in run_setup
exec(code, locals())
File "<string>", line 8, in <module>
ModuleNotFoundError: No module named 'numpy'
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error
Γ Getting requirements to build wheel did not run successfully.
β exit code: 1
β°β> See above for output.
note: This error originates from a subprocess, and is likely not a problem with pip.
numpy
was installed.
Hi copula devs. Just stopping by to let you know that could use this library to continuously set up estimation and submission, and maybe win a prize. It's easy to use github actions to set and forget. I'd have put this in discussion if it were enabled.
Open up https://lnkd.in/gUm4Ns5 in colab and run it to generate yourself a write key.
Save the key as a Github secret called WRITE_KEY
Click on "accept" when Github asks you if you want to enable GitHub actions. Go to Actions and you'll see the only action used in your repo (similar to https://lnkd.in/gv_Agt6). Enable it.
Or see links from this post.
https://www.linkedin.com/posts/petercotton_micropredictionmicroactors-activity-6746618350325104640-UIW3
Hello,
I found that 2 tests in the testsuite are failing. Details are here:
copulae_failures.txt
My environment is following:
appdirs==1.4.4
attrs==22.1.0
autocommand==2.2.2
cffi==1.15.1
cryptography==38.0.3
Cython==0.29.32
exceptiongroup==1.0.4
inflect==6.0.2
iniconfig==1.1.1
jaraco.context==4.2.0
jaraco.functools==3.5.2
jaraco.text==3.11.0
joblib==1.2.0
more-itertools==9.0.0
numpy==1.23.5
ordered-set==4.1.0
packaging==21.3
pandas==1.5.1
patsy==0.5.3
pluggy==1.0.0
ply==3.11
pycparser==2.21
pydantic==1.10.2
pyparsing==3.0.9
pytest==7.2.0
python-dateutil==2.8.2
pytz==2022.6
scikit-learn==1.1.3
scipy==1.9.3
six==1.16.0
statsmodels==0.13.5
threadpoolctl==3.1.0
tomli==2.0.1
trove-classifiers==2022.10.19
typing_extensions==4.4.0
validate-pyproject==0.10.1
wrapt==1.14.1
Arch Linux, GCC 12.2.0
Anton K.
Hey @DanielBok , I was experimenting with the package here, I've installed the latest from conda.
I have this dataset link
which has a dimension of 2, being mostly gamma or pareto marginals.
Running the below code:
from copulae import FrankCopula
distTypes = ['gamma', 'gamma']
df = pd.read_csv("C:/...")
cop = MarginalCopula(FrankCopula(dim=2), distTypes)
cop.fit(df)
Gives back tracebackerror, where I inserted a print statement in params to see what is triggering the assertion:
31.92679324563328
31.92679324563328
31.92679324563328
31.926793260633282
1028.2277099641103
1028.2277099791104
nan
Traceback (most recent call last):
File "<ipython-input-15-5f9a10d4956d>", line 2, in <module>
cop.fit(self.df)
File "C:\kaleb\miniconda\envs\eth_proba\lib\site-packages\copulae\marginal\marginal.py", line 143, in fit
self._copula.fit(data, x0, method, optim_options, ties, **kwargs)
File "C:\kaleb\miniconda\envs\eth_proba\lib\site-packages\copulae\copula\base.py", line 112, in fit
CopulaEstimator(self, data, x0=x0, method=method, verbose=verbose, optim_options=optim_options)
File "C:\kaleb\miniconda\envs\eth_proba\lib\site-packages\copulae\copula\estimator\estimator.py", line 65, in __init__
self.fit() # fit the copula
File "C:\kaleb\miniconda\envs\eth_proba\lib\site-packages\copulae\copula\estimator\estimator.py", line 70, in fit
MaxLikelihoodEstimator(self.copula, self.data, self.initial_params, self.options, self.verbose).fit(m)
File "C:\kaleb\miniconda\envs\eth_proba\lib\site-packages\copulae\copula\estimator\max_likelihood.py", line 55, in fit
res: OptimizeResult = minimize(self.copula_log_lik, self.initial_params, **self.optim_options)
File "C:\kaleb\miniconda\envs\eth_proba\lib\site-packages\scipy\optimize\_minimize.py", line 626, in minimize
constraints, callback=callback, **options)
File "C:\kaleb\miniconda\envs\eth_proba\lib\site-packages\scipy\optimize\slsqp.py", line 422, in _minimize_slsqp
fx = sf.fun(x)
File "C:\kaleb\miniconda\envs\eth_proba\lib\site-packages\scipy\optimize\_differentiable_functions.py", line 182, in fun
self._update_fun()
File "C:\kaleb\miniconda\envs\eth_proba\lib\site-packages\scipy\optimize\_differentiable_functions.py", line 166, in _update_fun
self._update_fun_impl()
File "C:\kaleb\miniconda\envs\eth_proba\lib\site-packages\scipy\optimize\_differentiable_functions.py", line 73, in update_fun
self.f = fun_wrapped(self.x)
File "C:\kaleb\miniconda\envs\eth_proba\lib\site-packages\scipy\optimize\_differentiable_functions.py", line 70, in fun_wrapped
return fun(x, *args)
File "C:\kaleb\miniconda\envs\eth_proba\lib\site-packages\copulae\copula\estimator\max_likelihood.py", line 87, in copula_log_lik
return -self.copula.log_lik(self.data, to_pobs=False)
File "C:\kaleb\miniconda\envs\eth_proba\lib\site-packages\copulae\copula\base.py", line 155, in log_lik
return self.pdf(data, log=True).sum()
File "C:\kaleb\miniconda\envs\eth_proba\lib\site-packages\copulae\utility\utils.py", line 36, in internal
res = np.asarray(f(cls, x, *args, **kwargs))
File "C:\kaleb\miniconda\envs\eth_proba\lib\site-packages\copulae\archimedean\frank.py", line 123, in pdf
assert not np.isnan(self.params), "Copula must have parameters to calculate parameters"
AssertionError: Copula must have parameters to calculate parameters
I am not sure if it is the way I am running the package that is the issue. Aside from that with other copulas, I've tried generating random samples and plotting them on a chart, that results are surprising. Maybe I am not using the package wrong. One example below which uses the clayton copula gave back results that don't seem to fit what you see empirically, although maybe clayton should be able to capture the degree of relation between the 2 rvs.
The About area of the GitHub project page shows a link to https://copulae.readthedocs.io/en/stable/copulae/index.html, which is broken.
You might want to change this to https://copulae.readthedocs.io/en/latest/, as given in the README β or at least to https://copulae.readthedocs.io/en/stable/.
Hi,
Would the package provide support for discrete marginal distributions. Or will it support in future.
Not an issue.
I just wanted to thank you your work in this library. It is challenging to find good copula implementations in Python, and subjectively I believe your library is the best one I have found so far.
Thanks!
Hi Daniel,
I am comparing the copula fit between copulae and SAS and I am observing different results. In the linked notebook, I create random data and conduct the probability integral transform using different distributions for the clayton copula. Regardless of which distribution I choose to conduct probability integral transform, the theta parameter for Clayton is the same. This was not my expectation but conducting the same copula fitting in SAS with the exact same uniformly distributed marginals provides a bit of context.
When fitting a copula in SAS, you can specify whether your marginals are already uniformly distributed, or if they need to converted to uniformly distributed via the empirical CDF. First, I specified the marginals as already uniformly distributed, with results here. As expected (and different than the jupyter notebook), the theta parameter of the Clayton copula changed based on which distribution I used to conduct probability integral transform. The case where the input marginals were transformed to uniformly distributed via the empirical CDF yielded the exact same theta as copulae, providing me confidence in the fitting done for copulae.
Next, I fit all the uniformly distributed marginals to the Clayton copula in SAS while specifying that the marginals needed to be converted to uniformly distributed via the empirical CDF. In this iteration, the theta parameter estimated for the clayton copula was the same in all cases and identical to the estimate of the copulae. Those results are found here.
In conclusion, it appears copulae is converting the marginals to uniformly distributed via the empirical CDF regardless of the whether the data is already uniformly distributed or not. One of the biggest decisions when fitting a copula is choice of marginal distribution which should affect the fit of the copula as demonstrated by SAS. Is there a parameter I should change in the fit function such that copulae does not automatically convert marginals to uniformly distributed which should result in a different fit based on the distributions chosen to conduct the probability integral transform? Thanks!
Hi Daniel,
I want to understand the logic behind this function. Any reference to paper/book behind the constants and overall approach would be highly appreciated.
def gmm_marginal_cdf(x: np.ndarray, param: GMCParam):
"""
Approximates the inverse cdf of the input given the GMC parameters
Parameters
----------
x : np.ndarray
Vector of quantiles
param : GMCParam
The Gaussian Mixture Copula parameters
Returns
-------
np.ndarray
Cumulative distribution function evaluated at x
"""
sigmas = np.repeat(np.sqrt([np.diag(c) for c in param.covs]).T[np.newaxis, ...], len(x), axis=0)
means = np.repeat(param.means.T[np.newaxis, ...], len(x), axis=0)
**a1 = 0.3480242
a2 = -0.0958798
a3 = 0.7478556
rho = 0.47047
sqrt2 = 1.4142136**
_zi = (np.repeat(x[..., np.newaxis], param.n_clusters, axis=2) - means) / (sigmas * sqrt2)
za = np.abs(zi)
t = 1 / (1 + rho * za)_
chunk = 0.5 * (a1 * t + a2 * t ** 2 + a3 * t ** 3) * np.exp(-(za ** 2))
return np.where(zi < 0, chunk, 1 - chunk) @ param.
Hi,
I am trying to reproduce some results from an experiment I have done in r using the package βcopulaβ.
all the results match except when the data has zeros:
here is an example: I use the sample parameters in both sides [empirical copula with no somthing]
When the data has zeros: (results don't match)
data:
x
0.8720320
0.9472037
0.3241581
0.9226000
0.9763799
y
0.4154456
0.0000000
0.0000000
0.0000000
0.0000000
resulted cdf in r : 0.4, 0.0, 0.0, 0.0, 0.0
resulted cdf using "copulae" package: 0.4, 0.6, 0.2, 0.4, 0.8
Another example with no zeros: (results match)
data:
x
0.8993826,
0.8591301,
0.2108277,
0.0000000,
0.2508839,
y
0.9433767
0.7802619
0.5243279
0.9659404
0.5459261
resulted cdf in r : 0.8, 0.6, 0.2, 0.2, 0.4
resulted cdf using "copulae" package: 0.8, 0.6, 0.2, 0.2, 0.4
here is my script:
from copulae import EmpiricalCopula
import pandas as pd
df = pd.DataFrame(list(zip(l, l1)), columns =['x', 'y'])
emp_cop = EmpiricalCopula(2, df, smoothing="none")
data = emp_cop.data
print(emp_cop.cdf(data))
I tried to look into the code to figure out where this difference comes from but with no success? any idea?
There's a typo in line 4 of
https://copulae.readthedocs.io/en/stable/copulae/archimedean/index.html
Original:
However, more specifically, an Archimedean is defined as
Correct version:
the uniform random variable has been mistaken for
BTW
I'm not familiar with the rst file, so I don't know how to correctly make a commit and PR sry.
Hello,
I am trying to use Gaussian copula mixture with non-gaussian distributions or empirical distributions. I am not sure, how can I get the dependence structure for non-gaussian distributions.
Sorry didn't find another way to ask this?
Hey @DanielBok ,
if i try to fit Frank Copula on a dataset of dim 2 i have this error
AssertionError: Copula must have parameters to calculate parameters
in particular when the parameter of Frank Copula should be negative the code doesn't work even if in dimension 2.
Furthermore, very often using the Gaussian copula with some data it reports the error in the _mutivariate.py script because the covariance matrix becomes non-symmetric and not positive definite during the fitting.
Concretely, I executed the code you provided as an example:
from copulae import EmpiricalCopula
from copulae.datasets import load_marginal_data
df = load_marginal_data()
df.head(3)
STUDENT NORM EXP
0 -0.485878 2.646041 0.393322
1 -1.088878 2.906977 0.253731
2 -0.462133 3.166951 0.480696
emp_cop = EmpiricalCopula(3, df, smoothing="beta")
data = emp_cop.data # getting the pseudo-observation data (this is the converted df)
data[:3]
array([[0.32522493, 0.1886038 , 0.55781406],
[0.15161613, 0.39953349, 0.40953016],
[0.33622126, 0.65611463, 0.62645785]])
emp_cop.cdf(data[:2])
array([0.06865595, 0.06320104])
emp_cop.pdf([[0.5, 0.5, 0.5]])
The problem is that whatever arguments I input into emp_cop.pdf([[0.5, 0.5, 0.5]]), I get the same output 0.009268568506099015. For example, if you run emp_cop.pdf([[0.62, 0.91, 0.34]]) I again get 0.00926856850609901. Per my understanding, this number should represent a copula density (pdf) which should vary for different combinations of pseudo-observation values. Yet, it is not the case.
Any advice from your side would be highly appreciated. I created a Collab notebook you can directly access if you have a few minutes to help: https://colab.research.google.com/drive/1GQT7T4HcaDwT2zlCPPa22hGPASBUJTAV?usp=sharingβ©.
Hi, thanks for your really nice package!
I was wondering if there is a good reference you can provide for the gaussian mixture copula that describes the method.
Many thanks!
Hi Daniel,
I updated copulae to the newest version 0.7.3 and the notebook I previously used to test copulae no longer works. Here is a snapshot of the first part of the code:
import copulae
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import scipy.stats as st
from statsmodels.distributions.empirical_distribution import ECDF
from copulae import ClaytonCopula, FrankCopula, GumbelCopula
np.random.seed(100)
v1 = np.sort(st.norm.rvs(loc=0,scale=1.5,size=10000))
v2 = st.norm.rvs(loc=0,scale=1,size=10000)
v3 = np.sort(st.norm.rvs(loc=0,scale=1.5,size=10000))
v4 = st.norm.rvs(loc=0,scale=1,size=10000)
total_sims = pd.DataFrame([0.1*(v1+v2),0.1*(v3+v4)],index=['V1','V2']).T
plt.scatter(total_sims['V1'], total_sims['V2']);
clay_cop = ClaytonCopula(theta=1,dim=2)
clay_cop.fit(data=total_sims, x0=None)
This code yields the following error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-1-c5dff29e8c64> in <module>
21
22 clay_cop = ClaytonCopula(theta=1,dim=2)
---> 23 clay_cop.fit(data=total_sims, x0=None)
~\Miniconda3\lib\site-packages\copulae\copula\base.py in fit(self, data, x0, method, optim_options, ties, verbose, to_pobs, **kwargs)
144
145 x0 = np.asarray(x0) if x0 is not None and not isinstance(x0, np.ndarray) and isinstance(x0, Collection) else x0
--> 146 self._fit_smry = fit_copula(self, data, x0, method, verbose, optim_options, kwargs.get('scale', 1))
147
148 if isinstance(data, pd.DataFrame):
~\Miniconda3\lib\site-packages\copulae\copula\estimator\estimator.py in fit_copula(copula, data, x0, method, verbose, optim_options, scale)
92 m = method.lower()
93 if m in {'ml'}:
---> 94 x0 = initial_params(copula, data, x0)
95 return estimate_max_likelihood_params(copula, data, x0, options, verbose, scale)
96 elif m in ('itau', 'irho'):
~\Miniconda3\lib\site-packages\copulae\copula\estimator\estimator.py in initial_params(copula, data, x0)
117
118 try:
--> 119 start = estimate_corr_inverse_params(copula, data, 'itau').params
120 ll = copula.log_lik(data)
121
~\Miniconda3\lib\site-packages\copulae\copula\estimator\corr_inversion.py in estimate_corr_inverse_params(copula, data, type_)
32 raise ValueError("Correlation Inversion must be either 'itau' or 'irho'")
33
---> 34 icor = fit_cor(copula, data, type_)
35
36 if is_elliptical(copula):
~\Miniconda3\lib\site-packages\copulae\copula\estimator\corr_inversion.py in fit_cor(copula, data, typ)
69 indices = tri_indices(copula.dim, 1, 'lower')
70 if typ == 'itau':
---> 71 tau = kendall_tau(data)[indices]
72 theta = copula.itau(tau)
73 elif typ == 'irho':
~\Miniconda3\lib\site-packages\copulae\stats\correlation.py in kendall_tau(x, y, use)
162 %s
163 """
--> 164 return corr(x, y, 'kendall', use)
165
166
~\Miniconda3\lib\site-packages\copulae\stats\correlation.py in corr(x, y, method, use)
101 raise ValueError('x must be a matrix with dimension 2')
102 c = np.identity(x.shape[1])
--> 103 for (i, j), (c1, c2) in _yield_vectors(x, use):
104 c[i, j] = c[j, i] = compute_corr(c1, c2)
105
~\Miniconda3\lib\site-packages\copulae\stats\correlation.py in _yield_vectors(x, use)
256 yield (i, j), (x[v, i], x[v, j])
257 else:
--> 258 yield (i, j), (x[:, i], x[:, j])
259
260
~\Miniconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
2900 if self.columns.nlevels > 1:
2901 return self._getitem_multilevel(key)
-> 2902 indexer = self.columns.get_loc(key)
2903 if is_integer(indexer):
2904 indexer = [indexer]
~\Miniconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
2893 casted_key = self._maybe_cast_indexer(key)
2894 try:
-> 2895 return self._engine.get_loc(casted_key)
2896 except KeyError as err:
2897 raise KeyError(key) from err
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
TypeError: '(slice(None, None, None), 0)' is an invalid key
Given this error did not occur with the previous version, is it possible there is a bug in the update? I played around with the inputs to the fit function with no success. Thanks!
Hi Daniel,
I am working on a project and I want to give a fixed correlation matrix to a student t copula. This is easily done after the fit and overwriting the correlation matrix. However, is it possible, in the same manner as fixing the degrees of freedom beforehand, to fix the correlation matrix? This will heavily improve my computation time in copulae.fit(). Thanks in advance.
Warm regards,
Loek
I'm new to using copulas and have been finding this package really helpful. Going through the explainer on copulas here I noticed that the following steps were followed when generating random variables from copulas:
My question is if I'm trying to implement the above workflow entirely using the copulae package, and I start by creating, say an Gumbel copula object, and then call the random() method on it and ask it to generate a 1000 samples, do I get the equivalent of what STEP1 above created or STEP2?
Do I take the result of random() and just apply my target distributions' PPFs to it like the below?
from copulae import GumbelCopula
import pandas as pd
from scipy import stats
alphas = [2.43, 1.97, 2.21, 3.01, 3.22] # Target random variables are 5 pareto distributed vars with these parameters
ndim = len(alphas)
gmbl_cop = GumbelCopula(theta=2, dim=ndim)
u_df = pd.DataFrame(gmbl_cop.random(1000))
x_df = pd.DataFrame()
for i in range(ndim):
col = df.columns[i]
x_df[col] = df[col].apply(lambda x: stats.pareto.ppf(x, alphas[i]-1))
Thanks
Hi,
I am getting the following error message when trying to pip install copulae
:
Collecting copulae
Using cached https://files.pythonhosted.org/packages/66/e4/d727292d72ed1319bea836b0cdbfc304d1382f4cdf8d40017a57bbf9b0e9/copulae-0.7.5-cp36-cp36m-manylinux1_x86_64.whl
Collecting scipy>=1.5 (from copulae)
Using cached https://files.pythonhosted.org/packages/bb/bb/944f559d554df6c9adf037aa9fc982a9706ee0e96c0d5beac701cb158900/scipy-1.7.0.tar.gz
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/tmp/pip-build-ivoevpoe/scipy/setup.py", line 31, in <module>
raise RuntimeError("Python version >= 3.7 required.")
RuntimeError: Python version >= 3.7 required.
This is contradictory to the docs - https://copulae.readthedocs.io/en/latest/getting-started.html which state 3.6 will be supported until 23 December 2021.
How can this be fixed? Thanks
File "copulae\special_specfunc.pyx", line 1, in init copulae.special._specfunc
ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject
copulae versionοΌ0.7.8
pythonοΌ3.9
THANKS
When using 'from copulae import GaussianCopula' I get this warning:
ModuleNotFoundError: No module named 'copulae.special._specfunc'
Anyone knows how to solve this?
g_cop = elliptical.GaussianCopula(dim=ndim)
print("fit the copula to inputmatrix (shape = %s)" % (inputmatrix.shape,)) #(150,120)
g_cop.fit(inputmatrix)
https://copulae.readthedocs.io/en/latest/copulae/archimedean/gumbel.html
It says the formula is
C_\theta (u_1, \dots, u_d) = \exp( - (\sum_u^i (-\log u_{i})^{-\theta} )^{1/\theta})
It seems this formula should be
C_\theta (u_1, \dots, u_d) = \exp( - (\sum_u^i (-\log u_{i})^{\theta} )^{1/\theta})
Hey @DanielBok , happy new year & MC , just wanted to ask you about the last release, whenever I run a frank cop, it seems to start print things in stdout... maybe they are residuals that are being printed, maybe some verbose is set to 1 in the frankcopula... I tried looking to where that was, but didn't manage π
Thanks again
Google colab...
Collecting copulae
Downloading https://files.pythonhosted.org/packages/0b/b3/00723b63c2d985efaba8df6c48aaefdd7fc35bbf09b6ed4936fa0b6400c6/copulae-0.5.0.tar.gz (774kB)
|ββββββββββββββββββββββββββββββββ| 778kB 7.7MB/s
ERROR: Package 'copulae' requires a different Python: 3.6.9 not in '>=3.7'
Hi,
Thank you very much for your work! This has helped me immensely! As far as I know, the results of copula are normalized. For example, the range of my data before normalization is [0,100], how can I calculate the probability density corresponding to data 8 after using copula?
When using method = "itau"
for Student Copula it returns the following error for line 38 of copulae\copula\estimator\corr_inversion.py in estimate_corr_inverse_params(copula, data, type_):
36 if is_elliptical(copula):
37 estimate = icor
---> 38 copula.params[:] = estimate
39 elif is_archimedean(copula):
40 estimate = np.mean(icor)
TypeError: 'StudentParams' object does not support item assignment
Using GaussianCopula with method = "itau"
does not return an error
Similar to a previous user, I am trying to use the Gaussian or Student-t copula fit function, and they are crashing due to singular matrices for my data set. My data set is 2 dimensional with 250 entries, and the two columns are not linear combinations of each other, or approximate linear combinations.
This is a small example I generated using a random 4x2 data frame:
rand_pd
Β | 0 | 1
-- | -- | --
0.063849 | 0.939572
0.661251 | 0.998138
0.866290 | 0.826964
0.044704 | 0.241041
from copulae import GaussianCopula
g_cop = GaussianCopula(dim=2)
g_cop.fit(rand_pd)
LinAlgError Traceback (most recent call last)
in
1 from copulae import GaussianCopula
2 g_cop = GaussianCopula(dim=2) # initializing the copula
----> 3 g_cop.fit(rand_pd_2) # fit the copula to the data
~/.local/lib/python3.7/site-packages/copulae/copula/base.py in fit(self, data, x0, method, est_var, verbose, optim_options)
79 raise ValueError('Dimension of data does not match copula')
80
---> 81 CopulaEstimator(self, data, x0=x0, method=method, est_var=est_var, verbose=verbose, optim_options=optim_options)
82
83 return self
~/.local/lib/python3.7/site-packages/copulae/copula/estimator/estimator.py in init(self, copula, data, x0, method, est_var, verbose, optim_options)
64 self._verbose = verbose
65
---> 66 self.fit() # fit the copula
67
68 def fit(self):
~/.local/lib/python3.7/site-packages/copulae/copula/estimator/estimator.py in fit(self)
70 if m in {'ml', 'mpl'}:
71 MaxLikelihoodEstimator(self.copula, self.data, self.initial_params, self.optim_options, self._est_var,
---> 72 self._verbose).fit(m)
73 elif m in ('itau', 'irho'):
74 CorrInversionEstimator(self.copula, self.data, self._est_var, self._verbose).fit(m)
~/.local/lib/python3.7/site-packages/copulae/copula/estimator/max_likelihood.py in fit(self, method)
58 """
59
---> 60 res = self._optimize()
61
62 if not res['success']:
~/.local/lib/python3.7/site-packages/copulae/copula/estimator/max_likelihood.py in _optimize(self)
106
107 def _optimize(self) -> OptimizeResult:
--> 108 return minimize(self.copula_log_lik, self.initial_params, **self.optim_options)
~/.local/lib/python3.7/site-packages/scipy/optimize/_minimize.py in minimize(fun, x0, args, method, jac, hess, hessp, bounds, constraints, tol, callback, options)
607 elif meth == 'slsqp':
608 return _minimize_slsqp(fun, x0, args, jac, bounds,
--> 609 constraints, callback=callback, **options)
610 elif meth == 'trust-constr':
611 return _minimize_trustregion_constr(fun, x0, args, jac, hess, hessp,
~/.local/lib/python3.7/site-packages/scipy/optimize/slsqp.py in _minimize_slsqp(func, x0, args, jac, bounds, constraints, maxiter, ftol, iprint, disp, eps, callback, **unknown_options)
397
398 # Compute objective function
--> 399 fx = func(x)
400 try:
401 fx = float(np.asarray(fx))
~/.local/lib/python3.7/site-packages/scipy/optimize/optimize.py in function_wrapper(*wrapper_args)
298 def function_wrapper(wrapper_args):
299 ncalls[0] += 1
--> 300 return function((wrapper_args + args))
301
302 return ncalls, function_wrapper
~/.local/lib/python3.7/site-packages/copulae/copula/estimator/max_likelihood.py in copula_log_lik(self, param)
101 try:
102 self.copula.params = param
--> 103 return -self.copula.log_lik(self.data, to_pobs=False)
104 except ValueError: # error encountered when setting invalid parameters
105 return np.inf
~/.local/lib/python3.7/site-packages/copulae/elliptical/abstract.py in log_lik(self, data, to_pobs)
41 return -np.inf
42
---> 43 return super().log_lik(data, to_pobs=to_pobs)
44
45 @Property
~/.local/lib/python3.7/site-packages/copulae/copula/base.py in log_lik(self, data, to_pobs)
198 data = self.pobs(data) if to_pobs else data
199
--> 200 return self.pdf(data, log=True).sum()
201
202 @Property
~/.local/lib/python3.7/site-packages/copulae/utility/utils.py in internal(cls, x, *args, **kwargs)
34 x = np.asarray(x)
35
---> 36 res = np.asarray(f(cls, x, *args, **kwargs))
37 return res.item(0) if res.size == 1 else res
38
~/.local/lib/python3.7/site-packages/copulae/elliptical/gaussian.py in pdf(self, u, log)
80 sigma = self.sigma
81 q = norm.ppf(u)
---> 82 d = mvn.logpdf(q, cov=sigma) - norm.logpdf(q).sum(1)
83 return d if log else np.exp(d)
84
~/.local/lib/python3.7/site-packages/scipy/stats/_multivariate.py in logpdf(self, x, mean, cov, allow_singular)
493 dim, mean, cov = self._process_parameters(None, mean, cov)
494 x = self._process_quantiles(x, dim)
--> 495 psd = _PSD(cov, allow_singular=allow_singular)
496 out = self._logpdf(x, mean, psd.U, psd.log_pdet, psd.rank)
497 return _squeeze_output(out)
~/.local/lib/python3.7/site-packages/scipy/stats/_multivariate.py in init(self, M, cond, rcond, lower, check_finite, allow_singular)
161 d = s[s > eps]
162 if len(d) < len(s) and not allow_singular:
--> 163 raise np.linalg.LinAlgError('singular matrix')
164 s_pinv = _pinv_1d(s, eps)
165 U = np.multiply(u, np.sqrt(s_pinv))
LinAlgError: singular matrix
I am using class GaussianMixtureCopula
to fit my data:
gmc = GaussianMixtureCopula(6, 2).fit(X)
and this model performs well,
but when I try to generate some samples from the fitted model (gmc.random(2000)
),
it only produces random numbers between 0 and 1.
I tried to figure out the reason but failed. Could you please fix this issue?
Hi,
When fitting a Gaussian and Student Copula, I quite often encounter the np.linalg.LinAlgError: singular matrix.
I suspect this could be solved with other initial parameters, because my current workaround is to try to fit the Copula with a randomly selected subset of the data and then use the estimates as starting values for the original dataset. Until now, this always worked.
Maybe there is an easy fix for this?
I noticed that when invoking the random method, samples are returned following a uniform distribution. Is this meant to be the case -- why is random
not mapping the samples back to the original marginal distributions?
from copulae.datasets import load_residuals
from copulae import GaussianCopula
residuals = load_residuals()
n_samples, ndim = residuals.shape
g_cop = GaussianCopula(dim=ndim)
g_cop.fit(residuals)
fig, ax = plt.subplots(1,2)
rv = g_cop.random(n_samples)
ax[0].hist(residuals.values[:, 1])
ax[0].set_title('Residuals')
ax[1].hist(rv[:,1])
ax[1].set_title('Generated')
plt.plot()
Hi Daniel,
Unfortunately, I have observed the same behavior of the fit function, even after the update. I fit both marginals using a gumbel distribution and observed theta to be identical regardless of the input to p_obs. Here is an example:
import copulae
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import scipy.stats as st
from statsmodels.distributions.empirical_distribution import ECDF
from copulae import ClaytonCopula, FrankCopula, GumbelCopula
np.random.seed(100)
v1 = np.sort(st.norm.rvs(loc=0,scale=1.5,size=10000))
v2 = st.norm.rvs(loc=0,scale=1,size=10000)
v3 = np.sort(st.norm.rvs(loc=0,scale=1.5,size=10000))
v4 = st.norm.rvs(loc=0,scale=1,size=10000)
total_sims = pd.DataFrame([0.1*(v1+v2),0.1*(v3+v4)],index=['V1','V2']).T
gumbel_ca_unif = pd.DataFrame((st.gumbel_l.cdf(total_sims['V1'],*st.gumbel_l.fit(total_sims['V1']))),columns=['gumbel_V1'])
gumbel_es_unif = pd.DataFrame((st.gumbel_l.cdf(total_sims['V2'],*st.gumbel_l.fit(total_sims['V2']))),columns=['gumbel_V2'])
clay_cop1 = ClaytonCopula(dim=2)
clay_cop1.fit(data=gumbel_unif,p_obs=True)
print('p_obs=True:')
print(f"Clayton theta: {clay_cop1.params}")
print(f"Clayton Kendall : {clay_cop1.tau}")
print(f"Clayton LogLikihood : {clay_cop1.log_lik(gumbel_unif)}")
print('')
clay_cop2 = ClaytonCopula(dim=2)
clay_cop2.fit(data=gumbel_unif,p_obs=False)
print('p_obs=False:')
print(f"Clayton theta: {clay_cop2.params}")
print(f"Clayton Kendall : {clay_cop2.tau}")
print(f"Clayton LogLikihood : {clay_cop2.log_lik(gumbel_unif)}")
The results are as follows:
p_obs=True:
Clayton theta: 1.2019830759189278
Clayton Kendall : 0.3753870796378192
Clayton LogLikihood : 2442.710063910521
p_obs=False:
Clayton theta: 1.2019830759189278
Clayton Kendall : 0.3753870796378192
Clayton LogLikihood : 2442.710063910521
As you can see, the results are exactly identical, suggesting that the marginals are still being converted to pseudo observations regardless of the designation of p_obs. Your initial logical statement with p_obs looks good. Could there be anywhere else in the code where the marginals are unconditionally transformed to pseudo observations? Thanks!
I am not sure this is possible, but I am trying to do is save the parameters of the fit and load them in the future (as to avoid having to refit the data). So the parameters of a MarginalCopula fit are obtained using the following:
params = cop.params
Out[37]:
{'copula': 6.935870895461638,
'marginals': [{'type': 'gamma',
'parameters': {'a': 1.2455705214968726,
'loc': -0.00018850468681980907,
'scale': 10.725816166928752}},
{'type': 'genpareto',
'parameters': {'c': 1.9383054484824958,
'loc': -1.982576615044322e-09,
'scale': 1.273334393803809}}]}
Then loading them back into a copula object is kind of tricky because the parameters for a Marginal Copula is a dict, as opposed to a numpy array. Maybe there is already a hard way to this.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. πππ
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google β€οΈ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.