danielbok / copulae Goto Github PK

View Code? Open in Web Editor NEW

136.0 8.0 25.0 1.91 MB

Multivariate data modelling with Copulas in Python

Home Page: https://copulae.readthedocs.io/en/latest/

License: MIT License

Python 21.85% Makefile 0.04% Jupyter Notebook 76.26% Cython 1.85%

copula copula-models python statistics dependency-modeling copulae copulas conda pypi pypi-packages

copulae's People

Contributors

Stargazers

Watchers

copulae's Issues

cannot import name 'Ties' from 'copulae.types' (

In [3]: import copulae

ImportError Traceback (most recent call last)
in
----> 1 import copulae

~/anaconda3/lib/python3.7/site-packages/copulae/init.py in
1 from copulae._version import get_versions
----> 2 from copulae.archimedean import *
3 from copulae.elliptical import *
4 from copulae.empirical import *
5 from copulae.indep import *

~/anaconda3/lib/python3.7/site-packages/copulae/archimedean/init.py in
----> 1 from .clayton import ClaytonCopula
2 from .frank import FrankCopula
3 from .gumbel import GumbelCopula
4
5 all = ['ClaytonCopula', 'FrankCopula', 'GumbelCopula']

~/anaconda3/lib/python3.7/site-packages/copulae/archimedean/clayton.py in
2 import numpy.random as rng
3
----> 4 from copulae.copula import Summary, TailDep
5 from copulae.core import EPS, valid_rows_in_u
6 from copulae.stats import random_uniform

~/anaconda3/lib/python3.7/site-packages/copulae/copula/init.py in
1 from .abstract import AbstractCopula
----> 2 from .base import BaseCopula, TailDep
3 from .summary import Summary

~/anaconda3/lib/python3.7/site-packages/copulae/copula/base.py in
5
6 from copulae.copula.abstract import AbstractCopula
----> 7 from copulae.copula.estimator import CopulaEstimator
8 from copulae.core import pseudo_obs
9 from copulae.types import Array, Numeric

~/anaconda3/lib/python3.7/site-packages/copulae/copula/estimator/init.py in
----> 1 from .estimator import fit_copula, EstimationMethod

~/anaconda3/lib/python3.7/site-packages/copulae/copula/estimator/estimator.py in
6 from copulae.core import tri_indices
7 from copulae.stats import pearson_rho
----> 8 from copulae.types import Array, Ties
9 from copulae.utility.dict import merge_dict
10 from .corr_inversion import estimate_corr_inverse_params

ImportError: cannot import name 'Ties' from 'copulae.types' (/home/zakariae/anaconda3/lib/python3.7/site-packages/copulae/types.py)

Copula Fitting

Hey @DanielBok thank for you for making this great package for us py users.
Ok managed to get it to work perfectly with genpareto, didn't realize it was flexible to work with any distribution, until I dove in the code... if you need help with a certain aspect of it... let me know... I would be happy to contribute 🙏

Feature Request - Plot a copula

Usage example :

from copulae import GumbelCopula
cop = GumbelCopula(dim=2)
cop.fit(SOME_DATA)
cop.plot_cdf()
cop.plot_pdf()

Requirement to Fit Input Data

Hello, based on my understanding of copulas and experience using them in other languages, I believe the input data must be uniformly distributed before it can be fit to a copula. This is specified in definition 1, point 2 here. This is appears to be consistent with your example, where you fit a gaussian copula to data and state that the function will internally convert the data to psuedo observations here. I have a few questions:

What exact method is being used to conduct the probability integral transform to convert the data to pseudo observations and where can I find this in your code? I assume you are using the emprical CDF, but I wanted to be sure.
If I choose another distribution for the probability integral transform and the input data is already uniformly distributed, will the function further modify the input data in any way?
You appear to still be checking that the input data is uniformly distributed here. Does this check occur after the internal conversion to pseudo observations?

Thank you!

Copulae

Dear Daniel

I would like to ask you a question:
When I applied the Copulae Package, I encountered one problem:

But one month ago, I could use this package. So weird.

And I am writing my thesis, and I am looking forward to your reply

Thanks
Ruiting

Fit Summary

According to you documentation of this library when you do t_cop.sumary() (in this case for a student copula) it should also return the fit summary of it, however and as we can see in the code it just returns the Degree of Freedom and Correlation Matrix (normal summary) and not the other information it should return according to the documentatio (see attached image). Is there any intention of completing this or was that part in the documentation a mistake. Thanks for the help.

EmpiricalCopula usage

Hello,

I am trying to use the EmpiricalCopula class and I am having some difficulties.

Example code :

from copulae import EmpiricalCopula
import numpy as np

np.random.seed(8)
data = np.random.uniform(size=(3000, 3))
cop = EmpiricalCopula(dim=3, data=data)

print(cop.random(10))  # simulate random number
print(cop.cdf(np.array([[0.1, 0.5, 1]])))

I find that the cop.cdf always gives the same number no matter what I use in the input. Am I using it wrong?

Thanks for writing this package.
Best,

PEP 517 builds

Got the following error trying to install copulae:

$ pip wheel --use-pep517 "copulae (==0.7.7)"                                                                                                                     [14:42:45]
Collecting copulae==0.7.7
  Using cached copulae-0.7.7.tar.gz (660 kB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... error
  error: subprocess-exited-with-error
  
  × Getting requirements to build wheel did not run successfully.
  │ exit code: 1
  ╰─> [21 lines of output]
      Traceback (most recent call last):
        File "/root/venv/lib/python3.11/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 351, in <module>
          main()
        File "/root/venv/lib/python3.11/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 333, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/root/venv/lib/python3.11/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 118, in get_requires_for_build_wheel
          return hook(config_settings)
                 ^^^^^^^^^^^^^^^^^^^^^
        File "/tmp/pip-build-env-hde_99ze/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 341, in get_requires_for_build_wheel
          return self._get_build_requires(config_settings, requirements=['wheel'])
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/tmp/pip-build-env-hde_99ze/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 323, in _get_build_requires
          self.run_setup()
        File "/tmp/pip-build-env-hde_99ze/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 488, in run_setup
          self).run_setup(setup_script=setup_script)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/tmp/pip-build-env-hde_99ze/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 338, in run_setup
          exec(code, locals())
        File "<string>", line 8, in <module>
      ModuleNotFoundError: No module named 'numpy'
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

numpy was installed.

hi .. idea

Hi copula devs. Just stopping by to let you know that could use this library to continuously set up estimation and submission, and maybe win a prize. It's easy to use github actions to set and forget. I'd have put this in discussion if it were enabled.

Fork https://lnkd.in/g6AEMYW
Open up https://lnkd.in/gUm4Ns5 in colab and run it to generate yourself a write key.
Save the key as a Github secret called WRITE_KEY
Click on "accept" when Github asks you if you want to enable GitHub actions. Go to Actions and you'll see the only action used in your repo (similar to https://lnkd.in/gv_Agt6). Enable it.

Or see links from this post.
https://www.linkedin.com/posts/petercotton_micropredictionmicroactors-activity-6746618350325104640-UIW3

test_gmc failures

Hello,

I found that 2 tests in the testsuite are failing. Details are here:
copulae_failures.txt
My environment is following:

appdirs==1.4.4
attrs==22.1.0
autocommand==2.2.2
cffi==1.15.1
cryptography==38.0.3
Cython==0.29.32
exceptiongroup==1.0.4
inflect==6.0.2
iniconfig==1.1.1
jaraco.context==4.2.0
jaraco.functools==3.5.2
jaraco.text==3.11.0
joblib==1.2.0
more-itertools==9.0.0
numpy==1.23.5
ordered-set==4.1.0
packaging==21.3
pandas==1.5.1
patsy==0.5.3
pluggy==1.0.0
ply==3.11
pycparser==2.21
pydantic==1.10.2
pyparsing==3.0.9
pytest==7.2.0
python-dateutil==2.8.2
pytz==2022.6
scikit-learn==1.1.3
scipy==1.9.3
six==1.16.0
statsmodels==0.13.5
threadpoolctl==3.1.0
tomli==2.0.1
trove-classifiers==2022.10.19
typing_extensions==4.4.0
validate-pyproject==0.10.1
wrapt==1.14.1

Arch Linux, GCC 12.2.0

Anton K.

Frank Copula Crash + Other Issues

Hey @DanielBok , I was experimenting with the package here, I've installed the latest from conda.
I have this dataset link
which has a dimension of 2, being mostly gamma or pareto marginals.

Running the below code:

from copulae import  FrankCopula
distTypes = ['gamma', 'gamma']
df = pd.read_csv("C:/...")
cop = MarginalCopula(FrankCopula(dim=2), distTypes)
cop.fit(df)

Gives back tracebackerror, where I inserted a print statement in params to see what is triggering the assertion:

31.92679324563328
31.92679324563328
31.92679324563328
31.926793260633282
1028.2277099641103
1028.2277099791104
nan
Traceback (most recent call last):

  File "<ipython-input-15-5f9a10d4956d>", line 2, in <module>
    cop.fit(self.df)

  File "C:\kaleb\miniconda\envs\eth_proba\lib\site-packages\copulae\marginal\marginal.py", line 143, in fit
    self._copula.fit(data, x0, method, optim_options, ties, **kwargs)

  File "C:\kaleb\miniconda\envs\eth_proba\lib\site-packages\copulae\copula\base.py", line 112, in fit
    CopulaEstimator(self, data, x0=x0, method=method, verbose=verbose, optim_options=optim_options)

  File "C:\kaleb\miniconda\envs\eth_proba\lib\site-packages\copulae\copula\estimator\estimator.py", line 65, in __init__
    self.fit()  # fit the copula

  File "C:\kaleb\miniconda\envs\eth_proba\lib\site-packages\copulae\copula\estimator\estimator.py", line 70, in fit
    MaxLikelihoodEstimator(self.copula, self.data, self.initial_params, self.options, self.verbose).fit(m)

  File "C:\kaleb\miniconda\envs\eth_proba\lib\site-packages\copulae\copula\estimator\max_likelihood.py", line 55, in fit
    res: OptimizeResult = minimize(self.copula_log_lik, self.initial_params, **self.optim_options)

  File "C:\kaleb\miniconda\envs\eth_proba\lib\site-packages\scipy\optimize\_minimize.py", line 626, in minimize
    constraints, callback=callback, **options)

  File "C:\kaleb\miniconda\envs\eth_proba\lib\site-packages\scipy\optimize\slsqp.py", line 422, in _minimize_slsqp
    fx = sf.fun(x)

  File "C:\kaleb\miniconda\envs\eth_proba\lib\site-packages\scipy\optimize\_differentiable_functions.py", line 182, in fun
    self._update_fun()

  File "C:\kaleb\miniconda\envs\eth_proba\lib\site-packages\scipy\optimize\_differentiable_functions.py", line 166, in _update_fun
    self._update_fun_impl()

  File "C:\kaleb\miniconda\envs\eth_proba\lib\site-packages\scipy\optimize\_differentiable_functions.py", line 73, in update_fun
    self.f = fun_wrapped(self.x)

  File "C:\kaleb\miniconda\envs\eth_proba\lib\site-packages\scipy\optimize\_differentiable_functions.py", line 70, in fun_wrapped
    return fun(x, *args)

  File "C:\kaleb\miniconda\envs\eth_proba\lib\site-packages\copulae\copula\estimator\max_likelihood.py", line 87, in copula_log_lik
    return -self.copula.log_lik(self.data, to_pobs=False)

  File "C:\kaleb\miniconda\envs\eth_proba\lib\site-packages\copulae\copula\base.py", line 155, in log_lik
    return self.pdf(data, log=True).sum()

  File "C:\kaleb\miniconda\envs\eth_proba\lib\site-packages\copulae\utility\utils.py", line 36, in internal
    res = np.asarray(f(cls, x, *args, **kwargs))

  File "C:\kaleb\miniconda\envs\eth_proba\lib\site-packages\copulae\archimedean\frank.py", line 123, in pdf
    assert not np.isnan(self.params), "Copula must have parameters to calculate parameters"

AssertionError: Copula must have parameters to calculate parameters

I am not sure if it is the way I am running the package that is the issue. Aside from that with other copulas, I've tried generating random samples and plotting them on a chart, that results are surprising. Maybe I am not using the package wrong. One example below which uses the clayton copula gave back results that don't seem to fit what you see empirically, although maybe clayton should be able to capture the degree of relation between the 2 rvs.

GitHub project documentation link is broken

The About area of the GitHub project page shows a link to https://copulae.readthedocs.io/en/stable/copulae/index.html, which is broken.

You might want to change this to https://copulae.readthedocs.io/en/latest/, as given in the README – or at least to https://copulae.readthedocs.io/en/stable/.

Support for discrete marginals

Hi,

Would the package provide support for discrete marginal distributions. Or will it support in future.

Thank you!

Not an issue.

I just wanted to thank you your work in this library. It is challenging to find good copula implementations in Python, and subjectively I believe your library is the best one I have found so far.

Thanks!

Copula Fit Replication with SAS using Uniform Marginals

Hi Daniel,

I am comparing the copula fit between copulae and SAS and I am observing different results. In the linked notebook, I create random data and conduct the probability integral transform using different distributions for the clayton copula. Regardless of which distribution I choose to conduct probability integral transform, the theta parameter for Clayton is the same. This was not my expectation but conducting the same copula fitting in SAS with the exact same uniformly distributed marginals provides a bit of context.

When fitting a copula in SAS, you can specify whether your marginals are already uniformly distributed, or if they need to converted to uniformly distributed via the empirical CDF. First, I specified the marginals as already uniformly distributed, with results here. As expected (and different than the jupyter notebook), the theta parameter of the Clayton copula changed based on which distribution I used to conduct probability integral transform. The case where the input marginals were transformed to uniformly distributed via the empirical CDF yielded the exact same theta as copulae, providing me confidence in the fitting done for copulae.

Next, I fit all the uniformly distributed marginals to the Clayton copula in SAS while specifying that the marginals needed to be converted to uniformly distributed via the empirical CDF. In this iteration, the theta parameter estimated for the clayton copula was the same in all cases and identical to the estimate of the copulae. Those results are found here.

In conclusion, it appears copulae is converting the marginals to uniformly distributed via the empirical CDF regardless of the whether the data is already uniformly distributed or not. One of the biggest decisions when fitting a copula is choice of marginal distribution which should affect the fit of the copula as demonstrated by SAS. Is there a parameter I should change in the fit function such that copulae does not automatically convert marginals to uniformly distributed which should result in a different fit based on the distributions chosen to conduct the probability integral transform? Thanks!

Constants used in the marginal cdf function of gaussian mixture copula

Hi Daniel,

I want to understand the logic behind this function. Any reference to paper/book behind the constants and overall approach would be highly appreciated.

def gmm_marginal_cdf(x: np.ndarray, param: GMCParam):
"""
Approximates the inverse cdf of the input given the GMC parameters
Parameters
----------
x : np.ndarray
Vector of quantiles
param : GMCParam
The Gaussian Mixture Copula parameters
Returns
-------
np.ndarray
Cumulative distribution function evaluated at x
"""

sigmas = np.repeat(np.sqrt([np.diag(c) for c in param.covs]).T[np.newaxis, ...], len(x), axis=0)
means = np.repeat(param.means.T[np.newaxis, ...], len(x), axis=0)

**a1 = 0.3480242
a2 = -0.0958798
a3 = 0.7478556
rho = 0.47047
sqrt2 = 1.4142136**

_zi = (np.repeat(x[..., np.newaxis], param.n_clusters, axis=2) - means) / (sigmas * sqrt2)
za = np.abs(zi)
t = 1 / (1 + rho * za)_
chunk = 0.5 * (a1 * t + a2 * t ** 2 + a3 * t ** 3) * np.exp(-(za ** 2))
return np.where(zi < 0, chunk, 1 - chunk) @ param.

different results when comparing with R ‘copula’

Hi,
I am trying to reproduce some results from an experiment I have done in r using the package ‘copula’.
all the results match except when the data has zeros:
here is an example: I use the sample parameters in both sides [empirical copula with no somthing]

When the data has zeros: (results don't match)
data:
x
0.8720320
0.9472037
0.3241581
0.9226000
0.9763799
y
0.4154456
0.0000000
0.0000000
0.0000000
0.0000000

resulted cdf in r : 0.4, 0.0, 0.0, 0.0, 0.0
resulted cdf using "copulae" package: 0.4, 0.6, 0.2, 0.4, 0.8

Another example with no zeros: (results match)
data:
x
0.8993826,
0.8591301,
0.2108277,
0.0000000,
0.2508839,
y
0.9433767
0.7802619
0.5243279
0.9659404
0.5459261

resulted cdf in r : 0.8, 0.6, 0.2, 0.2, 0.4
resulted cdf using "copulae" package: 0.8, 0.6, 0.2, 0.2, 0.4

here is my script:

from copulae import EmpiricalCopula
import pandas as pd

df = pd.DataFrame(list(zip(l, l1)), columns =['x', 'y'])
emp_cop = EmpiricalCopula(2, df, smoothing="none")
data = emp_cop.data
print(emp_cop.cdf(data))

I tried to look into the code to figure out where this difference comes from but with no success? any idea?

typo in docs

There's a typo in line 4 of
https://copulae.readthedocs.io/en/stable/copulae/archimedean/index.html

Original:
However, more specifically, an Archimedean is defined as
$C(\textbf{u}) = \psi(\psi^{-1}(u_1), \dots, \psi(d_d)) \quad \textbf{u} \in [0, 1]^d$

Correct version:

$C(\textbf{u}) = \psi(\psi^{-1}(u_1), \dots, \psi(u_d)) \quad \textbf{u} \in [0, 1]^d$

the $d\^{\text{th}}$ uniform random variable $u_d$ has been mistaken for $d_d$

BTW
I'm not familiar with the rst file, so I don't know how to correctly make a commit and PR sry.

How-to

Hello,

I am trying to use Gaussian copula mixture with non-gaussian distributions or empirical distributions. I am not sure, how can I get the dependence structure for non-gaussian distributions.

Sorry didn't find another way to ask this?

MAKE US A COLAB COMPATIBLE PACKAGE PLEASE

Frank and Gaussian Copula give Error

Hey @DanielBok ,
if i try to fit Frank Copula on a dataset of dim 2 i have this error
AssertionError: Copula must have parameters to calculate parameters
in particular when the parameter of Frank Copula should be negative the code doesn't work even if in dimension 2.
Furthermore, very often using the Gaussian copula with some data it reports the error in the _mutivariate.py script because the covariance matrix becomes non-symmetric and not positive definite during the fitting.

EmpiricalCopula does not work

Concretely, I executed the code you provided as an example:

from copulae import EmpiricalCopula
from copulae.datasets import load_marginal_data
df = load_marginal_data()
df.head(3)
STUDENT NORM EXP
0 -0.485878 2.646041 0.393322
1 -1.088878 2.906977 0.253731
2 -0.462133 3.166951 0.480696
emp_cop = EmpiricalCopula(3, df, smoothing="beta")
data = emp_cop.data # getting the pseudo-observation data (this is the converted df)
data[:3]
array([[0.32522493, 0.1886038 , 0.55781406],
[0.15161613, 0.39953349, 0.40953016],
[0.33622126, 0.65611463, 0.62645785]])

must feed pseudo-observations into cdf

emp_cop.cdf(data[:2])
array([0.06865595, 0.06320104])
emp_cop.pdf([[0.5, 0.5, 0.5]])

The problem is that whatever arguments I input into emp_cop.pdf([[0.5, 0.5, 0.5]]), I get the same output 0.009268568506099015. For example, if you run emp_cop.pdf([[0.62, 0.91, 0.34]]) I again get 0.00926856850609901. Per my understanding, this number should represent a copula density (pdf) which should vary for different combinations of pseudo-observation values. Yet, it is not the case.
Any advice from your side would be highly appreciated. I created a Collab notebook you can directly access if you have a few minutes to help: https://colab.research.google.com/drive/1GQT7T4HcaDwT2zlCPPa22hGPASBUJTAV?usp=sharing .

Reference for gaussian mixture copula

Hi, thanks for your really nice package!

I was wondering if there is a good reference you can provide for the gaussian mixture copula that describes the method.

Many thanks!

Error after update

Hi Daniel,

I updated copulae to the newest version 0.7.3 and the notebook I previously used to test copulae no longer works. Here is a snapshot of the first part of the code:

import copulae
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import scipy.stats as st

from statsmodels.distributions.empirical_distribution import ECDF
from copulae import ClaytonCopula, FrankCopula, GumbelCopula

np.random.seed(100)

v1 = np.sort(st.norm.rvs(loc=0,scale=1.5,size=10000))
v2 = st.norm.rvs(loc=0,scale=1,size=10000)

v3 = np.sort(st.norm.rvs(loc=0,scale=1.5,size=10000))
v4 = st.norm.rvs(loc=0,scale=1,size=10000)

total_sims = pd.DataFrame([0.1*(v1+v2),0.1*(v3+v4)],index=['V1','V2']).T

plt.scatter(total_sims['V1'], total_sims['V2']);

clay_cop = ClaytonCopula(theta=1,dim=2)
clay_cop.fit(data=total_sims, x0=None)

This code yields the following error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-1-c5dff29e8c64> in <module>
     21 
     22 clay_cop = ClaytonCopula(theta=1,dim=2)
---> 23 clay_cop.fit(data=total_sims, x0=None)

~\Miniconda3\lib\site-packages\copulae\copula\base.py in fit(self, data, x0, method, optim_options, ties, verbose, to_pobs, **kwargs)
    144 
    145         x0 = np.asarray(x0) if x0 is not None and not isinstance(x0, np.ndarray) and isinstance(x0, Collection) else x0
--> 146         self._fit_smry = fit_copula(self, data, x0, method, verbose, optim_options, kwargs.get('scale', 1))
    147 
    148         if isinstance(data, pd.DataFrame):

~\Miniconda3\lib\site-packages\copulae\copula\estimator\estimator.py in fit_copula(copula, data, x0, method, verbose, optim_options, scale)
     92     m = method.lower()
     93     if m in {'ml'}:
---> 94         x0 = initial_params(copula, data, x0)
     95         return estimate_max_likelihood_params(copula, data, x0, options, verbose, scale)
     96     elif m in ('itau', 'irho'):

~\Miniconda3\lib\site-packages\copulae\copula\estimator\estimator.py in initial_params(copula, data, x0)
    117 
    118     try:
--> 119         start = estimate_corr_inverse_params(copula, data, 'itau').params
    120         ll = copula.log_lik(data)
    121 

~\Miniconda3\lib\site-packages\copulae\copula\estimator\corr_inversion.py in estimate_corr_inverse_params(copula, data, type_)
     32         raise ValueError("Correlation Inversion must be either 'itau' or 'irho'")
     33 
---> 34     icor = fit_cor(copula, data, type_)
     35 
     36     if is_elliptical(copula):

~\Miniconda3\lib\site-packages\copulae\copula\estimator\corr_inversion.py in fit_cor(copula, data, typ)
     69     indices = tri_indices(copula.dim, 1, 'lower')
     70     if typ == 'itau':
---> 71         tau = kendall_tau(data)[indices]
     72         theta = copula.itau(tau)
     73     elif typ == 'irho':

~\Miniconda3\lib\site-packages\copulae\stats\correlation.py in kendall_tau(x, y, use)
    162     %s
    163     """
--> 164     return corr(x, y, 'kendall', use)
    165 
    166 

~\Miniconda3\lib\site-packages\copulae\stats\correlation.py in corr(x, y, method, use)
    101             raise ValueError('x must be a matrix with dimension 2')
    102         c = np.identity(x.shape[1])
--> 103         for (i, j), (c1, c2) in _yield_vectors(x, use):
    104             c[i, j] = c[j, i] = compute_corr(c1, c2)
    105 

~\Miniconda3\lib\site-packages\copulae\stats\correlation.py in _yield_vectors(x, use)
    256                 yield (i, j), (x[v, i], x[v, j])
    257             else:
--> 258                 yield (i, j), (x[:, i], x[:, j])
    259 
    260 

~\Miniconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
   2900             if self.columns.nlevels > 1:
   2901                 return self._getitem_multilevel(key)
-> 2902             indexer = self.columns.get_loc(key)
   2903             if is_integer(indexer):
   2904                 indexer = [indexer]

~\Miniconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   2893             casted_key = self._maybe_cast_indexer(key)
   2894             try:
-> 2895                 return self._engine.get_loc(casted_key)
   2896             except KeyError as err:
   2897                 raise KeyError(key) from err

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

TypeError: '(slice(None, None, None), 0)' is an invalid key

Given this error did not occur with the previous version, is it possible there is a bug in the update? I played around with the inputs to the fit function with no success. Thanks!

Fixing Correlation Matrix if copulae.fit()

Hi Daniel,

I am working on a project and I want to give a fixed correlation matrix to a student t copula. This is easily done after the fit and overwriting the correlation matrix. However, is it possible, in the same manner as fixing the degrees of freedom beforehand, to fix the correlation matrix? This will heavily improve my computation time in copulae.fit(). Thanks in advance.

Warm regards,
Loek

Question about BaseCopula.random() Method

I'm new to using copulas and have been finding this package really helpful. Going through the explainer on copulas here I noticed that the following steps were followed when generating random variables from copulas:

Draw random samples from a multivariate distribution that describes the "dependency structure" between the target random variables in the dataset. The example used a multivariate normal distribution to describe the dependency structure.
Convert the above random samples to probabilities using the dependency structure's marginal CDFs. The example called the norm.cdf on each column of the random samples obtained from STEP1 to arrive at x1 and x2
Use the results obtained in STEP2 to arrive at the ultimate random values desired for the target random variables by using their marginal distributions' PPFs. The example assumed one of the variables was distributed according to Student distribution, and the other according to Laplace.

My question is if I'm trying to implement the above workflow entirely using the copulae package, and I start by creating, say an Gumbel copula object, and then call the random() method on it and ask it to generate a 1000 samples, do I get the equivalent of what STEP1 above created or STEP2?

Do I take the result of random() and just apply my target distributions' PPFs to it like the below?

from copulae import GumbelCopula
import pandas as pd
from scipy import stats

alphas = [2.43, 1.97, 2.21, 3.01, 3.22] # Target random variables are 5 pareto distributed vars with these parameters

ndim = len(alphas)
gmbl_cop = GumbelCopula(theta=2, dim=ndim)

u_df = pd.DataFrame(gmbl_cop.random(1000))

x_df = pd.DataFrame()
for i in range(ndim):
    col = df.columns[i]
    x_df[col] = df[col].apply(lambda x: stats.pareto.ppf(x, alphas[i]-1))

Thanks

Python 3.6 not supported?

Hi,

I am getting the following error message when trying to pip install copulae:

Collecting copulae
  Using cached https://files.pythonhosted.org/packages/66/e4/d727292d72ed1319bea836b0cdbfc304d1382f4cdf8d40017a57bbf9b0e9/copulae-0.7.5-cp36-cp36m-manylinux1_x86_64.whl
Collecting scipy>=1.5 (from copulae)
  Using cached https://files.pythonhosted.org/packages/bb/bb/944f559d554df6c9adf037aa9fc982a9706ee0e96c0d5beac701cb158900/scipy-1.7.0.tar.gz
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-build-ivoevpoe/scipy/setup.py", line 31, in <module>
        raise RuntimeError("Python version >= 3.7 required.")
    RuntimeError: Python version >= 3.7 required.

This is contradictory to the docs - https://copulae.readthedocs.io/en/latest/getting-started.html which state 3.6 will be supported until 23 December 2021.

How can this be fixed? Thanks

An error occurred while importing copulae

File "copulae\special_specfunc.pyx", line 1, in init copulae.special._specfunc
ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject
copulae version：0.7.8
python：3.9
THANKS

ModuleNotFoundError: No module named 'copulae.special._specfunc'

When using 'from copulae import GaussianCopula' I get this warning:

ModuleNotFoundError: No module named 'copulae.special._specfunc'

Anyone knows how to solve this?

when I fit the model, it runs so slow

g_cop = elliptical.GaussianCopula(dim=ndim)
print("fit the copula to inputmatrix (shape = %s)" % (inputmatrix.shape,)) #(150,120)
g_cop.fit(inputmatrix)

gumbel formula correction

https://copulae.readthedocs.io/en/latest/copulae/archimedean/gumbel.html

It says the formula is
C_\theta (u_1, \dots, u_d) = \exp( - (\sum_u^i (-\log u_{i})^{-\theta} )^{1/\theta})

It seems this formula should be
C_\theta (u_1, \dots, u_d) = \exp( - (\sum_u^i (-\log u_{i})^{\theta} )^{1/\theta})

Printing Issue

Hey @DanielBok , happy new year & MC , just wanted to ask you about the last release, whenever I run a frank cop, it seems to start print things in stdout... maybe they are residuals that are being printed, maybe some verbose is set to 1 in the frankcopula... I tried looking to where that was, but didn't manage 😅
Thanks again

Heads up that your version requirement rules out colab by default

Google colab...

Collecting copulae
Downloading https://files.pythonhosted.org/packages/0b/b3/00723b63c2d985efaba8df6c48aaefdd7fc35bbf09b6ed4936fa0b6400c6/copulae-0.5.0.tar.gz (774kB)
|████████████████████████████████| 778kB 7.7MB/s
ERROR: Package 'copulae' requires a different Python: 3.6.9 not in '>=3.7'

How to Inverse Normalisation？[Quantile Function]

Hi,
Thank you very much for your work! This has helped me immensely! As far as I know, the results of copula are normalized. For example, the range of my data before normalization is [0,100], how can I calculate the probability density corresponding to data 8 after using copula?

the summary( ) got converging error

Ver 0.7.7 - Kendall Tau not supported for Student Copula

When using method = "itau" for Student Copula it returns the following error for line 38 of copulae\copula\estimator\corr_inversion.py in estimate_corr_inverse_params(copula, data, type_):

36 if is_elliptical(copula):
37 estimate = icor
---> 38 copula.params[:] = estimate
39 elif is_archimedean(copula):
40 estimate = np.mean(icor)

TypeError: 'StudentParams' object does not support item assignment

Using GaussianCopula with method = "itau" does not return an error

singular matrices in 2 dimensional elliptical copulas

Similar to a previous user, I am trying to use the Gaussian or Student-t copula fit function, and they are crashing due to singular matrices for my data set. My data set is 2 dimensional with 250 entries, and the two columns are not linear combinations of each other, or approximate linear combinations.

This is a small example I generated using a random 4x2 data frame:

rand_pd
| 0 | 1
-- | -- | --
0.063849 | 0.939572
0.661251 | 0.998138
0.866290 | 0.826964
0.044704 | 0.241041

from copulae import GaussianCopula
g_cop = GaussianCopula(dim=2)
g_cop.fit(rand_pd)

LinAlgError Traceback (most recent call last)
in
1 from copulae import GaussianCopula
2 g_cop = GaussianCopula(dim=2) # initializing the copula
----> 3 g_cop.fit(rand_pd_2) # fit the copula to the data

~/.local/lib/python3.7/site-packages/copulae/copula/base.py in fit(self, data, x0, method, est_var, verbose, optim_options)
79 raise ValueError('Dimension of data does not match copula')
80
---> 81 CopulaEstimator(self, data, x0=x0, method=method, est_var=est_var, verbose=verbose, optim_options=optim_options)
82
83 return self

~/.local/lib/python3.7/site-packages/copulae/copula/estimator/estimator.py in init(self, copula, data, x0, method, est_var, verbose, optim_options)
64 self._verbose = verbose
65
---> 66 self.fit() # fit the copula
67
68 def fit(self):

~/.local/lib/python3.7/site-packages/copulae/copula/estimator/estimator.py in fit(self)
70 if m in {'ml', 'mpl'}:
71 MaxLikelihoodEstimator(self.copula, self.data, self.initial_params, self.optim_options, self._est_var,
---> 72 self._verbose).fit(m)
73 elif m in ('itau', 'irho'):
74 CorrInversionEstimator(self.copula, self.data, self._est_var, self._verbose).fit(m)

~/.local/lib/python3.7/site-packages/copulae/copula/estimator/max_likelihood.py in fit(self, method)
58 """
59
---> 60 res = self._optimize()
61
62 if not res['success']:

~/.local/lib/python3.7/site-packages/copulae/copula/estimator/max_likelihood.py in _optimize(self)
106
107 def _optimize(self) -> OptimizeResult:
--> 108 return minimize(self.copula_log_lik, self.initial_params, **self.optim_options)

~/.local/lib/python3.7/site-packages/scipy/optimize/_minimize.py in minimize(fun, x0, args, method, jac, hess, hessp, bounds, constraints, tol, callback, options)
607 elif meth == 'slsqp':
608 return _minimize_slsqp(fun, x0, args, jac, bounds,
--> 609 constraints, callback=callback, **options)
610 elif meth == 'trust-constr':
611 return _minimize_trustregion_constr(fun, x0, args, jac, hess, hessp,

~/.local/lib/python3.7/site-packages/scipy/optimize/slsqp.py in _minimize_slsqp(func, x0, args, jac, bounds, constraints, maxiter, ftol, iprint, disp, eps, callback, **unknown_options)
397
398 # Compute objective function
--> 399 fx = func(x)
400 try:
401 fx = float(np.asarray(fx))

~/.local/lib/python3.7/site-packages/scipy/optimize/optimize.py in function_wrapper(*wrapper_args)
298 def function_wrapper(wrapper_args):
299 ncalls[0] += 1
--> 300 return function((wrapper_args + args))
301
302 return ncalls, function_wrapper

~/.local/lib/python3.7/site-packages/copulae/copula/estimator/max_likelihood.py in copula_log_lik(self, param)
101 try:
102 self.copula.params = param
--> 103 return -self.copula.log_lik(self.data, to_pobs=False)
104 except ValueError: # error encountered when setting invalid parameters
105 return np.inf

~/.local/lib/python3.7/site-packages/copulae/elliptical/abstract.py in log_lik(self, data, to_pobs)
41 return -np.inf
42
---> 43 return super().log_lik(data, to_pobs=to_pobs)
44
45 @Property

~/.local/lib/python3.7/site-packages/copulae/copula/base.py in log_lik(self, data, to_pobs)
198 data = self.pobs(data) if to_pobs else data
199
--> 200 return self.pdf(data, log=True).sum()
201
202 @Property

~/.local/lib/python3.7/site-packages/copulae/utility/utils.py in internal(cls, x, *args, **kwargs)
34 x = np.asarray(x)
35
---> 36 res = np.asarray(f(cls, x, *args, **kwargs))
37 return res.item(0) if res.size == 1 else res
38

~/.local/lib/python3.7/site-packages/copulae/elliptical/gaussian.py in pdf(self, u, log)
80 sigma = self.sigma
81 q = norm.ppf(u)
---> 82 d = mvn.logpdf(q, cov=sigma) - norm.logpdf(q).sum(1)
83 return d if log else np.exp(d)
84

~/.local/lib/python3.7/site-packages/scipy/stats/_multivariate.py in logpdf(self, x, mean, cov, allow_singular)
493 dim, mean, cov = self._process_parameters(None, mean, cov)
494 x = self._process_quantiles(x, dim)
--> 495 psd = _PSD(cov, allow_singular=allow_singular)
496 out = self._logpdf(x, mean, psd.U, psd.log_pdet, psd.rank)
497 return _squeeze_output(out)

~/.local/lib/python3.7/site-packages/scipy/stats/_multivariate.py in init(self, M, cond, rcond, lower, check_finite, allow_singular)
161 d = s[s > eps]
162 if len(d) < len(s) and not allow_singular:
--> 163 raise np.linalg.LinAlgError('singular matrix')
164 s_pinv = _pinv_1d(s, eps)
165 U = np.multiply(u, np.sqrt(s_pinv))

LinAlgError: singular matrix

Gaussian Mixture Copula Model gives wrong sampling

I am using class GaussianMixtureCopula to fit my data:
gmc = GaussianMixtureCopula(6, 2).fit(X) and this model performs well,
but when I try to generate some samples from the fitted model (gmc.random(2000)),
it only produces random numbers between 0 and 1.
I tried to figure out the reason but failed. Could you please fix this issue?

SingularMatrix error when Copula can be fitted with alternative start values

Hi,

When fitting a Gaussian and Student Copula, I quite often encounter the np.linalg.LinAlgError: singular matrix.

I suspect this could be solved with other initial parameters, because my current workaround is to try to fit the Copula with a randomly selected subset of the data and then use the estimates as starting values for the original dataset. Until now, this always worked.

Maybe there is an easy fix for this?

GaussianCopula.Random returns samples uniformly distributed

I noticed that when invoking the random method, samples are returned following a uniform distribution. Is this meant to be the case -- why is random not mapping the samples back to the original marginal distributions?

from copulae.datasets import load_residuals
from copulae import GaussianCopula

residuals = load_residuals()

n_samples, ndim = residuals.shape
g_cop = GaussianCopula(dim=ndim)
g_cop.fit(residuals)

fig, ax = plt.subplots(1,2)
rv = g_cop.random(n_samples)
ax[0].hist(residuals.values[:, 1])
ax[0].set_title('Residuals')
ax[1].hist(rv[:,1])
ax[1].set_title('Generated')
plt.plot()

Designation of p_obs does not affect the fit

Hi Daniel,

Unfortunately, I have observed the same behavior of the fit function, even after the update. I fit both marginals using a gumbel distribution and observed theta to be identical regardless of the input to p_obs. Here is an example:

import copulae
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import scipy.stats as st

from statsmodels.distributions.empirical_distribution import ECDF
from copulae import ClaytonCopula, FrankCopula, GumbelCopula

np.random.seed(100)

v1 = np.sort(st.norm.rvs(loc=0,scale=1.5,size=10000))
v2 = st.norm.rvs(loc=0,scale=1,size=10000)

v3 = np.sort(st.norm.rvs(loc=0,scale=1.5,size=10000))
v4 = st.norm.rvs(loc=0,scale=1,size=10000)

total_sims = pd.DataFrame([0.1*(v1+v2),0.1*(v3+v4)],index=['V1','V2']).T

gumbel_ca_unif = pd.DataFrame((st.gumbel_l.cdf(total_sims['V1'],*st.gumbel_l.fit(total_sims['V1']))),columns=['gumbel_V1'])
gumbel_es_unif = pd.DataFrame((st.gumbel_l.cdf(total_sims['V2'],*st.gumbel_l.fit(total_sims['V2']))),columns=['gumbel_V2'])

clay_cop1 = ClaytonCopula(dim=2)
clay_cop1.fit(data=gumbel_unif,p_obs=True)

print('p_obs=True:')
print(f"Clayton theta: {clay_cop1.params}")
print(f"Clayton Kendall : {clay_cop1.tau}")
print(f"Clayton LogLikihood : {clay_cop1.log_lik(gumbel_unif)}")
print('')

clay_cop2 = ClaytonCopula(dim=2)
clay_cop2.fit(data=gumbel_unif,p_obs=False)

print('p_obs=False:')
print(f"Clayton theta: {clay_cop2.params}")
print(f"Clayton Kendall : {clay_cop2.tau}")
print(f"Clayton LogLikihood : {clay_cop2.log_lik(gumbel_unif)}")

The results are as follows:

p_obs=True:
Clayton theta: 1.2019830759189278
Clayton Kendall : 0.3753870796378192
Clayton LogLikihood : 2442.710063910521

p_obs=False:
Clayton theta: 1.2019830759189278
Clayton Kendall : 0.3753870796378192
Clayton LogLikihood : 2442.710063910521

As you can see, the results are exactly identical, suggesting that the marginals are still being converted to pseudo observations regardless of the designation of p_obs. Your initial logical statement with p_obs looks good. Could there be anywhere else in the code where the marginals are unconditionally transformed to pseudo observations? Thanks!

Feature Request - Parameter Loading - MarginalCopula

I am not sure this is possible, but I am trying to do is save the parameters of the fit and load them in the future (as to avoid having to refit the data). So the parameters of a MarginalCopula fit are obtained using the following:

params = cop.params
Out[37]: 
{'copula': 6.935870895461638,
 'marginals': [{'type': 'gamma',
   'parameters': {'a': 1.2455705214968726,
    'loc': -0.00018850468681980907,
    'scale': 10.725816166928752}},
  {'type': 'genpareto',
   'parameters': {'c': 1.9383054484824958,
    'loc': -1.982576615044322e-09,
    'scale': 1.273334393803809}}]}

Then loading them back into a copula object is kind of tricky because the parameters for a Marginal Copula is a dict, as opposed to a numpy array. Maybe there is already a hard way to this.

danielbok / copulae Goto Github PK

copulae's People

Contributors

Stargazers

Watchers

Forkers

copulae's Issues

In [3]: import copulae

must feed pseudo-observations into cdf

Recommend Projects

Recommend Topics

Recommend Org