Giter Club home page Giter Club logo

fentechsolutions / causaldiscoverytoolbox Goto Github PK

View Code? Open in Web Editor NEW
1.1K 36.0 200.0 14.22 MB

Package for causal inference in graphs and in the pairwise settings. Tools for graph structure recovery and dependencies are included.

Home Page: https://fentechsolutions.github.io/CausalDiscoveryToolbox/html/index.html

License: MIT License

Python 95.80% R 3.70% Shell 0.35% Dockerfile 0.09% Roff 0.06%
causal-inference graph causality causal-models algorithm machine-learning graph-structure-recovery python causal-discovery toolbox

causaldiscoverytoolbox's People

Contributors

aldro61 avatar diviyank avatar eric-carlsson avatar goudetolivier avatar jiahy0825 avatar kaydhi avatar koutrgor8012 avatar kurowasan avatar pacowong avatar ritik99 avatar tashay avatar thesignpainter avatar yknot avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

causaldiscoverytoolbox's Issues

GS/MMPC algorithm removed all non-related variable

Hi there,
This is my very first post on Github so parden me if this is a simple fix to my problem.

I am currently exploring the different graph inference algorithms on my own dataset. One thing I realized is that when implementing algorithm like GS/MMPC the uncorrelated variable is not shown(automatically removed) in the nx.adjacency_matrix(output).todense() command.

For example I fed 50 variables into the MMPC algorithm and out of which only 35 variables are correlated and 15 are not. The nx.adjacency_matrix(output).todense() will only spit out the matrix for 35 variables and I do not know the variables that are uncorrelated and removed.

While CDT does provide plotting option I do perfer to use package like Graphviz. Thus it will be helpful to acquire the matrix for all 50 input variables instead.

Is there a way for me to obtain such matrix?
Thank you in advance!

load_dataset in cdt.data is "random"

Hello. I am currently working on multiple Pairwise algorithms, and found the following problems.
Whenever i load the TCEP data as follows,
data, labels = load_dataset('tuebingen')

the apparent order of the pairs is different.
I had this problem because of the following: to test threshold-dependent algos (such as ANM, GNN) i had a single jupyter file in which i would have a single instance of data, labels loaded, and then threshold and compute metrics on the predictions of different pre-recorded scores.

But each time i would call data, labels = load_dataset('tuebingen') and then compute_metrics(preds,labels) , they would all change ?!
I was quite worried when the accuracy of cdt implementations of RCC,ANM and IGCI were as low as 40% on TCEP ...

Thank you for this wonderful work by the way!

Reproducing Jarfo results on Kaggle challenge data

Hi,
I've been trying to reproduce Jarfo results from the Kaggle challenge (2013) with its final dataset from http://www.causality.inf.ethz.ch/CEdata/, i.e., with CEfinal_train_text.zip and CEfinal_test_text.zip.

It seems to output very different results compared to the original code, while the learning parameters and the used features seems to be the same.

import numpy as np
import pandas as pd
from cdt.causality.pairwise.Jarfo import Jarfo
from cdt.utils.io import read_causal_pairs
from cdt import SETTINGS

SETTINGS.GPU = False
SETTINGS.NJOBS = 1

train_data = read_causal_pairs(".../CEfinal_train_pairs.csv")
train_target = pd.read_csv(".../CEfinal_train_target.csv").iloc[:,:2].set_index("SampleID")
test_data = read_causal_pairs(".../CEfinal_test_pairs.csv")
test_target = pd.read_csv(".../CEfinal_test_target.csv").iloc[:,:2].set_index("SampleID")

j = Jarfo()
j.fit(train_data, train_target)

jp = j.predict(test_data)

acc = np.mean(jp * test_target.values > 0)
print(acc)

0.25491827465325406

I've tested it with
python 3.7.3
cdt 0.5.14

Thank you in advance for any hint.
Best
Tom

using other data sets - pandas data frame

Hi @Diviyan-Kalainathan

How do we use this package with other data sets.
Do we just create a pandas data frame and use the load_data function?

Many thanks,
Best,
Andrew

All labels in TCEP are equal

Hi,
I was doing some more experiments using CGNN based algorithms on TCEP, and I couldn't replicate the results. I also got only 60% accuracy using the hyperparameters that got me 75%~ a few months ago. I couldn't find the cause of this, but when I printed the labels they were all 1. Is this a mistake?

Edit: I checked the commit warnings, apparently you did permute all pairs so that the labels is 1, and still is the right label. I'll compare with other instances of the dataset but I'm now reassured!

[enhancement] Causal Estimation Code?

Hey. Maybe this is a dumb question, but has any thought been put into performing causal estimation in a graph in this package? It's great to have a package like this that does causal discovery, but I'd also like to have the functionality that can generate the conditional probability estimates over the causal graph, as well as a method for generating answers to Pr(Y|do(X=x)) for x and y being continuous and discrete. I've found the ability to do this kind of general calculation to be absent from most python packages. I currently use the pomegranate package to build a causal bayes net for my discrete variables (and I discretize continuous variables when I have them), and then use another package to determine backdoor or frontdoor based variables I can condition on, and then use the adjustment formula. It just all seems to be glued together however because all of this doesn't exist in the same package, especially for graphs with all discrete variables, which I would think is a simpler case than the continuous case.

I know that 'DoWhy' exists, but it only allows for estimation of cause and effect for the direct effect case, where a treatment variable directly impacts an outcome variable.

Docker image usage

Hi Diviyan,

Thank you for your project!

I am not very familiar with Docker, but I think it's worth trying.
I did the following:

$ docker images
REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
divkal/cdt-py3.6    0.4.0               2af9c5c51ac1        2 weeks ago         3.34GB
$ docker run -it --entrypoint /bin/bash 2af9c5c51ac1
root@35b4b45ecfc8:/#

But it seems there is no python or conda in the image:

root@35b4b45ecfc8:/# python
bash: python: command not found
root@35b4b45ecfc8:/# conda
bash: conda: command not found

I believe it must be my fault. Could you please give some pointers?

Best,
Abel

FSGNN

Hi,

When trying to run:

Fsgnn = FSGNN(train_epochs=1000, test_epochs=500, l1=0.1, batch_size=1000)

From the example notebook of the LUCAS data (with my own data, though) I get the following error:

TypeError: __init__() got an unexpected keyword argument 'batch_size'
I wonder whether there are some changes on the definition of the FSGNN subject. Second, I would like to know how does the package manage categorical data, as, from what I have noticed, the values are converted into floats at some point in the objects generation.

Thanks in advance,
Sergio

In cdt.causality.pairwise , the "RCC" example uses "Jarfo"

Hi,
minor problem, just a confusing example I found through experiments, see below

class RCC(PairwiseModel):
    """Randomized Causation Coefficient model. 2nd approach in the Fast
    Causation challenge.
    **Description:** The Randomized causation coefficient (RCC) relies on the
    projection of the empirical distributions into a RKHS using random cosine
    embeddings, then classfies the pairs using a random forest based on those
    features.
    **Data Type:** Continuous, Categorical, Mixed
    **Assumptions:** This method needs a substantial amount of labelled causal
    pairs to train itself. Its final performance depends on the training set
    used.
    Args:
        rand_coeff (int): number of randomized coefficients
        nb_estimators (int): number of estimators
        nb_min_leaves (int): number of min samples leaves of the estimator
        max_depth (): (optional) max depth of the model
        s (float): scaling
        njobs (int): number of jobs to be run on parallel (defaults to ``cdt.SETTINGS.NJOBS``)
        verbose (bool): verbosity (defaults to ``cdt.SETTINGS.verbose``)
    .. note::
       Ref : Lopez-Paz, David and Muandet, Krikamol and Schölkopf, Bernhard and Tolstikhin, Ilya O,
       "Towards a Learning Theory of Cause-Effect Inference", ICML 2015.
    Example:
        >>> from cdt.causality.pairwise import RCC
        >>> import networkx as nx
        >>> import matplotlib.pyplot as plt
        >>> from cdt.data import load_dataset
        >>> from sklearn.model_selection import train_test_split
        >>> data, labels = load_dataset('tuebingen')
        >>> X_tr, X_te, y_tr, y_te = train_test_split(data, labels, train_size=.5)
        >>>
        >>> obj = Jarfo()
        >>> obj.fit(X_tr, y_tr)
        >>> # This example uses the predict() method
        >>> output = obj.predict(X_te)
        >>>
        >>> # This example uses the orient_graph() method. The dataset used
        >>> # can be loaded using the cdt.data module
        >>> data, graph = load_dataset('sachs')
        >>> output = obj.orient_graph(data, nx.DiGraph(graph))
        >>>
        >>> # To view the directed graph run the following command
        >>> nx.draw_networkx(output, font_size=8)
        >>> plt.show()
    """

Example

Hey man in trying your example, I have run into an error. I am not sure what to make of it, as it might be device related. I will include the message here and the file link.

https://github.com/snowde/firmai.github.io/blob/master/Discovery_LUCAS.ipynb


# So the question is, if you only have the data can you find the
# structure of the graph
from cdt.independence.graph import FSGNN

Fsgnn = FSGNN()

start_time = time.time()
ugraph = Fsgnn.predict(data, train_epochs=2000, test_epochs=1000, threshold=5e-4, l1=0.01)
print("--- Execution time : %4.4s seconds ---" % (time.time() - start_time))
nx.draw_networkx(ugraph, font_size=8) # The plot function allows for quick visualization of the graph.
plt.show()
# List results
pd.DataFrame(list(ugraph.edges(data='weight')))

`---------------------------------------------------------------------------
RemoteTraceback Traceback (most recent call last)
RemoteTraceback:
"""
Traceback (most recent call last):
File "/Users/dereksnow/anaconda/envs/py36/lib/python3.6/site-packages/joblib/_parallel_backends.py", line 350, in call
return self.func(*args, **kwargs)
File "/Users/dereksnow/anaconda/envs/py36/lib/python3.6/site-packages/joblib/parallel.py", line 131, in call
return [func(*args, **kwargs) for func, args, kwargs in self.items]
File "/Users/dereksnow/anaconda/envs/py36/lib/python3.6/site-packages/joblib/parallel.py", line 131, in
return [func(*args, **kwargs) for func, args, kwargs in self.items]
File "/Users/dereksnow/anaconda/envs/py36/lib/python3.6/site-packages/cdt/independence/graph/model.py", line 43, in run_feature_selection
return self.predict_features(df_features, df_target, **kwargs)
File "/Users/dereksnow/anaconda/envs/py36/lib/python3.6/site-packages/cdt/independence/graph/FSGNN.py", line 79, in predict_features
x = th.FloatTensor(scale(df_features.as_matrix())).to(device)
AttributeError: 'torch.FloatTensor' object has no attribute 'to'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/Users/dereksnow/anaconda/envs/py36/lib/python3.6/multiprocessing/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "/Users/dereksnow/anaconda/envs/py36/lib/python3.6/site-packages/joblib/_parallel_backends.py", line 359, in call
raise TransportableException(text, e_type)
joblib.my_exceptions.TransportableException: TransportableException


AttributeError Sat Jun 30 16:19:48 2018
PID: 92373 Python 3.6.3: /Users/dereksnow/anaconda/envs/py36/bin/python
...........................................................................
/Users/dereksnow/anaconda/envs/py36/lib/python3.6/site-packages/joblib/parallel.py in call(self=<joblib.parallel.BatchedCalls object>)
126 def init(self, iterator_slice):
127 self.items = list(iterator_slice)
128 self.size = len(self.items)
129
130 def call(self):
--> 131 return [func(*args, **kwargs) for func, args, kwargs in self.items]
self.items = [(<bound method FeatureSelectionModel.run_feature
...n of <cdt.independence.graph.FSGNN.FSGNN object>>, ( Allergy Anxiety Genetics Peer_Pressure...0.858699 -1.037579

[500 rows x 11 columns], 'Allergy', 0), {'l1': 0.01, 'test_epochs': 1000, 'train_epochs': 2000})]
132
133 def len(self):
134 return self._size
135

...........................................................................
/Users/dereksnow/anaconda/envs/py36/lib/python3.6/site-packages/joblib/parallel.py in (.0=<list_iterator object>)
126 def init(self, iterator_slice):
127 self.items = list(iterator_slice)
128 self.size = len(self.items)
129
130 def call(self):
--> 131 return [func(*args, **kwargs) for func, args, kwargs in self.items]
func = <bound method FeatureSelectionModel.run_feature
...n of <cdt.independence.graph.FSGNN.FSGNN object>>
args = ( Allergy Anxiety Genetics Peer_Pressure...0.858699 -1.037579

[500 rows x 11 columns], 'Allergy', 0)
kwargs = {'l1': 0.01, 'test_epochs': 1000, 'train_epochs': 2000}
132
133 def len(self):
134 return self._size
135

...........................................................................
/Users/dereksnow/anaconda/envs/py36/lib/python3.6/site-packages/cdt/independence/graph/model.py in run_feature_selection(self=<cdt.independence.graph.FSGNN.FSGNN object>, df_data= Allergy Anxiety Genetics Peer_Pressure...0.858699 -1.037579

[500 rows x 11 columns], target='Allergy', idx=0, **kwargs={'l1': 0.01, 'test_epochs': 1000, 'train_epochs': 2000})
38 list_features = list(df_data.columns.values)
39 list_features.remove(target)
40 df_target = pd.DataFrame(df_data[target], columns=[target])
41 df_features = df_data[list_features]
42
---> 43 return self.predict_features(df_features, df_target, **kwargs)
self.predict_features = <bound method FSGNN.predict_features of <cdt.independence.graph.FSGNN.FSGNN object>>
df_features = Anxiety Genetics Peer_Pressure Attentio...0.858699 -1.037579

[500 rows x 10 columns]
df_target = Allergy
0 -0.266076
1 -0.579084
2 -0...8 -0.064685
499 -0.638704

[500 rows x 1 columns]
kwargs = {'l1': 0.01, 'test_epochs': 1000, 'train_epochs': 2000}
44
45 def predict(self, df_data, threshold=0.05, **kwargs):
46 """Get the skeleton of the graph from raw data.
47

...........................................................................
/Users/dereksnow/anaconda/envs/py36/lib/python3.6/site-packages/cdt/independence/graph/FSGNN.py in predict_features(self=<cdt.independence.graph.FSGNN.FSGNN object>, df_features= Anxiety Genetics Peer_Pressure Attentio...0.858699 -1.037579

[500 rows x 10 columns], df_target= Allergy
0 -0.266076
1 -0.579084
2 -0...8 -0.064685
499 -0.638704

[500 rows x 1 columns], nh=20, idx=0, dropout=0.0, activation_function=<class 'torch.nn.modules.activation.ReLU'>, lr=0.01, l1=0.01, train_epochs=2000, test_epochs=1000, device='cpu', verbose=False, nb_runs=3)
74 activation_function=th.nn.ReLU, lr=0.01, l1=0.1, # batch_size=-1,
75 train_epochs=1000, test_epochs=1000, device=None,
76 verbose=None, nb_runs=3):
77 """For one variable, predict its neighbours."""
78 device, verbose = SETTINGS.get_default(('device', device), ('verbose', verbose))
---> 79 x = th.FloatTensor(scale(df_features.as_matrix())).to(device)
x = undefined
df_features.as_matrix.to = undefined
device = 'cpu'
80 y = th.FloatTensor(scale(df_target.as_matrix())).to(device)
81 out = []
82 for i in range(nb_runs):
83 model = FSGNN_model([x.size()[1], nh, 1],

AttributeError: 'torch.FloatTensor' object has no attribute 'to'


"""

The above exception was the direct cause of the following exception:

TransportableException Traceback (most recent call last)
~/anaconda/envs/py36/lib/python3.6/site-packages/joblib/parallel.py in retrieve(self)
698 if getattr(self._backend, 'supports_timeout', False):
--> 699 self._output.extend(job.get(timeout=self.timeout))
700 else:

~/anaconda/envs/py36/lib/python3.6/multiprocessing/pool.py in get(self, timeout)
643 else:
--> 644 raise self._value
645

TransportableException: TransportableException


AttributeError Sat Jun 30 16:19:48 2018
PID: 92373 Python 3.6.3: /Users/dereksnow/anaconda/envs/py36/bin/python
...........................................................................
/Users/dereksnow/anaconda/envs/py36/lib/python3.6/site-packages/joblib/parallel.py in call(self=<joblib.parallel.BatchedCalls object>)
126 def init(self, iterator_slice):
127 self.items = list(iterator_slice)
128 self.size = len(self.items)
129
130 def call(self):
--> 131 return [func(*args, **kwargs) for func, args, kwargs in self.items]
self.items = [(<bound method FeatureSelectionModel.run_feature
...n of <cdt.independence.graph.FSGNN.FSGNN object>>, ( Allergy Anxiety Genetics Peer_Pressure...0.858699 -1.037579

[500 rows x 11 columns], 'Allergy', 0), {'l1': 0.01, 'test_epochs': 1000, 'train_epochs': 2000})]
132
133 def len(self):
134 return self._size
135

...........................................................................
/Users/dereksnow/anaconda/envs/py36/lib/python3.6/site-packages/joblib/parallel.py in (.0=<list_iterator object>)
126 def init(self, iterator_slice):
127 self.items = list(iterator_slice)
128 self.size = len(self.items)
129
130 def call(self):
--> 131 return [func(*args, **kwargs) for func, args, kwargs in self.items]
func = <bound method FeatureSelectionModel.run_feature
...n of <cdt.independence.graph.FSGNN.FSGNN object>>
args = ( Allergy Anxiety Genetics Peer_Pressure...0.858699 -1.037579

[500 rows x 11 columns], 'Allergy', 0)
kwargs = {'l1': 0.01, 'test_epochs': 1000, 'train_epochs': 2000}
132
133 def len(self):
134 return self._size
135

...........................................................................
/Users/dereksnow/anaconda/envs/py36/lib/python3.6/site-packages/cdt/independence/graph/model.py in run_feature_selection(self=<cdt.independence.graph.FSGNN.FSGNN object>, df_data= Allergy Anxiety Genetics Peer_Pressure...0.858699 -1.037579

[500 rows x 11 columns], target='Allergy', idx=0, **kwargs={'l1': 0.01, 'test_epochs': 1000, 'train_epochs': 2000})
38 list_features = list(df_data.columns.values)
39 list_features.remove(target)
40 df_target = pd.DataFrame(df_data[target], columns=[target])
41 df_features = df_data[list_features]
42
---> 43 return self.predict_features(df_features, df_target, **kwargs)
self.predict_features = <bound method FSGNN.predict_features of <cdt.independence.graph.FSGNN.FSGNN object>>
df_features = Anxiety Genetics Peer_Pressure Attentio...0.858699 -1.037579

[500 rows x 10 columns]
df_target = Allergy
0 -0.266076
1 -0.579084
2 -0...8 -0.064685
499 -0.638704

[500 rows x 1 columns]
kwargs = {'l1': 0.01, 'test_epochs': 1000, 'train_epochs': 2000}
44
45 def predict(self, df_data, threshold=0.05, **kwargs):
46 """Get the skeleton of the graph from raw data.
47

...........................................................................
/Users/dereksnow/anaconda/envs/py36/lib/python3.6/site-packages/cdt/independence/graph/FSGNN.py in predict_features(self=<cdt.independence.graph.FSGNN.FSGNN object>, df_features= Anxiety Genetics Peer_Pressure Attentio...0.858699 -1.037579

[500 rows x 10 columns], df_target= Allergy
0 -0.266076
1 -0.579084
2 -0...8 -0.064685
499 -0.638704

[500 rows x 1 columns], nh=20, idx=0, dropout=0.0, activation_function=<class 'torch.nn.modules.activation.ReLU'>, lr=0.01, l1=0.01, train_epochs=2000, test_epochs=1000, device='cpu', verbose=False, nb_runs=3)
74 activation_function=th.nn.ReLU, lr=0.01, l1=0.1, # batch_size=-1,
75 train_epochs=1000, test_epochs=1000, device=None,
76 verbose=None, nb_runs=3):
77 """For one variable, predict its neighbours."""
78 device, verbose = SETTINGS.get_default(('device', device), ('verbose', verbose))
---> 79 x = th.FloatTensor(scale(df_features.as_matrix())).to(device)
x = undefined
df_features.as_matrix.to = undefined
device = 'cpu'
80 y = th.FloatTensor(scale(df_target.as_matrix())).to(device)
81 out = []
82 for i in range(nb_runs):
83 model = FSGNN_model([x.size()[1], nh, 1],

AttributeError: 'torch.FloatTensor' object has no attribute 'to'


During handling of the above exception, another exception occurred:

JoblibAttributeError Traceback (most recent call last)
in ()
6
7 start_time = time.time()
----> 8 ugraph = Fsgnn.predict(data, train_epochs=2000, test_epochs=1000, threshold=5e-4, l1=0.01)
9 print("--- Execution time : %4.4s seconds ---" % (time.time() - start_time))
10 nx.draw_networkx(ugraph, font_size=8) # The plot function allows for quick visualization of the graph.

~/anaconda/envs/py36/lib/python3.6/site-packages/cdt/independence/graph/model.py in predict(self, df_data, threshold, **kwargs)
53 result_feature_selection = Parallel(n_jobs=nb_jobs)(delayed(self.run_feature_selection)
54 (df_data, node, idx, **kwargs)
---> 55 for idx, node in enumerate(list_nodes))
56 else:
57 result_feature_selection = [self.run_feature_selection(df_data, node, idx, **kwargs) for idx, node in enumerate(list_nodes)]

~/anaconda/envs/py36/lib/python3.6/site-packages/joblib/parallel.py in call(self, iterable)
787 # consumption.
788 self._iterating = False
--> 789 self.retrieve()
790 # Make sure that we get a last message telling us we are done
791 elapsed_time = time.time() - self._start_time

~/anaconda/envs/py36/lib/python3.6/site-packages/joblib/parallel.py in retrieve(self)
738 exception = exception_type(report)
739
--> 740 raise exception
741
742 def call(self, iterable):

JoblibAttributeError: JoblibAttributeError


Multiprocessing exception:
...........................................................................
/Users/dereksnow/anaconda/envs/py36/lib/python3.6/runpy.py in _run_module_as_main(mod_name='ipykernel.main', alter_argv=1)
188 sys.exit(msg)
189 main_globals = sys.modules["main"].dict
190 if alter_argv:
191 sys.argv[0] = mod_spec.origin
192 return _run_code(code, main_globals, None,
--> 193 "main", mod_spec)
mod_spec = ModuleSpec(name='ipykernel.main', loader=<_f...b/python3.6/site-packages/ipykernel/main.py')
194
195 def run_module(mod_name, init_globals=None,
196 run_name=None, alter_sys=False):
197 """Execute a module's code without importing it

...........................................................................
/Users/dereksnow/anaconda/envs/py36/lib/python3.6/runpy.py in _run_code(code=<code object at 0x10f262a50, file "/Use...3.6/site-packages/ipykernel/main.py", line 1>, run_globals={'annotations': {}, 'builtins': <module 'builtins' (built-in)>, 'cached': '/Users/dereksnow/anaconda/envs/py36/lib/python3....ges/ipykernel/pycache/main.cpython-36.pyc', 'doc': None, 'file': '/Users/dereksnow/anaconda/envs/py36/lib/python3.6/site-packages/ipykernel/main.py', 'loader': <_frozen_importlib_external.SourceFileLoader object>, 'name': 'main', 'package': 'ipykernel', 'spec': ModuleSpec(name='ipykernel.main', loader=<_f...b/python3.6/site-packages/ipykernel/main.py'), 'app': <module 'ipykernel.kernelapp' from '/Users/derek.../python3.6/site-packages/ipykernel/kernelapp.py'>}, init_globals=None, mod_name='main', mod_spec=ModuleSpec(name='ipykernel.main', loader=<_f...b/python3.6/site-packages/ipykernel/main.py'), pkg_name='ipykernel', script_name=None)
80 cached = cached,
81 doc = None,
82 loader = loader,
83 package = pkg_name,
84 spec = mod_spec)
---> 85 exec(code, run_globals)
code = <code object at 0x10f262a50, file "/Use...3.6/site-packages/ipykernel/main.py", line 1>
run_globals = {'annotations': {}, 'builtins': <module 'builtins' (built-in)>, 'cached': '/Users/dereksnow/anaconda/envs/py36/lib/python3....ges/ipykernel/pycache/main.cpython-36.pyc', 'doc': None, 'file': '/Users/dereksnow/anaconda/envs/py36/lib/python3.6/site-packages/ipykernel/main.py', 'loader': <_frozen_importlib_external.SourceFileLoader object>, 'name': 'main', 'package': 'ipykernel', 'spec': ModuleSpec(name='ipykernel.main', loader=<_f...b/python3.6/site-packages/ipykernel/main.py'), 'app': <module 'ipykernel.kernelapp' from '/Users/derek.../python3.6/site-packages/ipykernel/kernelapp.py'>}
86 return run_globals
87
88 def _run_module_code(code, init_globals=None,
89 mod_name=None, mod_spec=None,

...........................................................................
/Users/dereksnow/anaconda/envs/py36/lib/python3.6/site-packages/ipykernel/main.py in ()
1 if name == 'main':
2 from ipykernel import kernelapp as app
----> 3 app.launch_new_instance()

...........................................................................
/Users/dereksnow/anaconda/envs/py36/lib/python3.6/site-packages/traitlets/config/application.py in launch_instance(cls=<class 'ipykernel.kernelapp.IPKernelApp'>, argv=None, **kwargs={})
653
654 If a global instance already exists, this reinitializes and starts it
655 """
656 app = cls.instance(**kwargs)
657 app.initialize(argv)
--> 658 app.start()
app.start = <bound method IPKernelApp.start of <ipykernel.kernelapp.IPKernelApp object>>
659
660 #-----------------------------------------------------------------------------
661 # utility functions, for convenience
662 #-----------------------------------------------------------------------------

...........................................................................
/Users/dereksnow/anaconda/envs/py36/lib/python3.6/site-packages/ipykernel/kernelapp.py in start(self=<ipykernel.kernelapp.IPKernelApp object>)
472 return self.subapp.start()
473 if self.poller is not None:
474 self.poller.start()
475 self.kernel.start()
476 try:
--> 477 ioloop.IOLoop.instance().start()
478 except KeyboardInterrupt:
479 pass
480
481 launch_new_instance = IPKernelApp.launch_instance

...........................................................................
/Users/dereksnow/anaconda/envs/py36/lib/python3.6/site-packages/tornado/ioloop.py in start(self=<zmq.eventloop.ioloop.ZMQIOLoop object>)
883 self._events.update(event_pairs)
884 while self._events:
885 fd, events = self._events.popitem()
886 try:
887 fd_obj, handler_func = self._handlers[fd]
--> 888 handler_func(fd_obj, events)
handler_func = <function wrap..null_wrapper>
fd_obj = <zmq.sugar.socket.Socket object>
events = 1
889 except (OSError, IOError) as e:
890 if errno_from_exception(e) == errno.EPIPE:
891 # Happens when the client closes the connection
892 pass

...........................................................................
/Users/dereksnow/anaconda/envs/py36/lib/python3.6/site-packages/tornado/stack_context.py in null_wrapper(*args=(<zmq.sugar.socket.Socket object>, 1), **kwargs={})
272 # Fast path when there are no active contexts.
273 def null_wrapper(*args, **kwargs):
274 try:
275 current_state = _state.contexts
276 _state.contexts = cap_contexts[0]
--> 277 return fn(*args, **kwargs)
args = (<zmq.sugar.socket.Socket object>, 1)
kwargs = {}
278 finally:
279 _state.contexts = current_state
280 null_wrapper._wrapped = True
281 return null_wrapper

...........................................................................
/Users/dereksnow/anaconda/envs/py36/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py in _handle_events(self=<zmq.eventloop.zmqstream.ZMQStream object>, fd=<zmq.sugar.socket.Socket object>, events=1)
445 return
446 zmq_events = self.socket.EVENTS
447 try:
448 # dispatch events:
449 if zmq_events & zmq.POLLIN and self.receiving():
--> 450 self._handle_recv()
self._handle_recv = <bound method ZMQStream._handle_recv of <zmq.eventloop.zmqstream.ZMQStream object>>
451 if not self.socket:
452 return
453 if zmq_events & zmq.POLLOUT and self.sending():
454 self._handle_send()

...........................................................................
/Users/dereksnow/anaconda/envs/py36/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py in _handle_recv(self=<zmq.eventloop.zmqstream.ZMQStream object>)
475 else:
476 raise
477 else:
478 if self._recv_callback:
479 callback = self._recv_callback
--> 480 self._run_callback(callback, msg)
self._run_callback = <bound method ZMQStream._run_callback of <zmq.eventloop.zmqstream.ZMQStream object>>
callback = <function wrap..null_wrapper>
msg = [<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>]
481
482
483 def _handle_send(self):
484 """Handle a send event."""

...........................................................................
/Users/dereksnow/anaconda/envs/py36/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py in _run_callback(self=<zmq.eventloop.zmqstream.ZMQStream object>, callback=<function wrap..null_wrapper>, *args=([<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>],), **kwargs={})
427 close our socket."""
428 try:
429 # Use a NullContext to ensure that all StackContexts are run
430 # inside our blanket exception handler rather than outside.
431 with stack_context.NullContext():
--> 432 callback(*args, **kwargs)
callback = <function wrap..null_wrapper>
args = ([<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>],)
kwargs = {}
433 except:
434 gen_log.error("Uncaught exception in ZMQStream callback",
435 exc_info=True)
436 # Re-raise the exception so that IOLoop.handle_callback_exception

...........................................................................
/Users/dereksnow/anaconda/envs/py36/lib/python3.6/site-packages/tornado/stack_context.py in null_wrapper(*args=([<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>],), **kwargs={})
272 # Fast path when there are no active contexts.
273 def null_wrapper(*args, **kwargs):
274 try:
275 current_state = _state.contexts
276 _state.contexts = cap_contexts[0]
--> 277 return fn(*args, **kwargs)
args = ([<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>],)
kwargs = {}
278 finally:
279 _state.contexts = current_state
280 null_wrapper._wrapped = True
281 return null_wrapper

...........................................................................
/Users/dereksnow/anaconda/envs/py36/lib/python3.6/site-packages/ipykernel/kernelbase.py in dispatcher(msg=[<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>])
278 if self.control_stream:
279 self.control_stream.on_recv(self.dispatch_control, copy=False)
280
281 def make_dispatcher(stream):
282 def dispatcher(msg):
--> 283 return self.dispatch_shell(stream, msg)
msg = [<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>]
284 return dispatcher
285
286 for s in self.shell_streams:
287 s.on_recv(make_dispatcher(s), copy=False)

...........................................................................
/Users/dereksnow/anaconda/envs/py36/lib/python3.6/site-packages/ipykernel/kernelbase.py in dispatch_shell(self=<ipykernel.ipkernel.IPythonKernel object>, stream=<zmq.eventloop.zmqstream.ZMQStream object>, msg={'buffers': [], 'content': {'allow_stdin': True, 'code': "# So the question is, if you only have the data ...s\npd.DataFrame(list(ugraph.edges(data='weight')))", 'silent': False, 'stop_on_error': True, 'store_history': True, 'user_expressions': {}}, 'header': {'date': datetime.datetime(2018, 6, 30, 4, 19, 48, 623932, tzinfo=tzutc()), 'msg_id': '3F664F4DF8994E9980115EFD76A70918', 'msg_type': 'execute_request', 'session': '1DCE294733F74AD0BBF17671DE4E5820', 'username': 'username', 'version': '5.2'}, 'metadata': {}, 'msg_id': '3F664F4DF8994E9980115EFD76A70918', 'msg_type': 'execute_request', 'parent_header': {}})
230 self.log.warn("Unknown message type: %r", msg_type)
231 else:
232 self.log.debug("%s: %s", msg_type, msg)
233 self.pre_handler_hook()
234 try:
--> 235 handler(stream, idents, msg)
handler = <bound method Kernel.execute_request of <ipykernel.ipkernel.IPythonKernel object>>
stream = <zmq.eventloop.zmqstream.ZMQStream object>
idents = [b'1DCE294733F74AD0BBF17671DE4E5820']
msg = {'buffers': [], 'content': {'allow_stdin': True, 'code': "# So the question is, if you only have the data ...s\npd.DataFrame(list(ugraph.edges(data='weight')))", 'silent': False, 'stop_on_error': True, 'store_history': True, 'user_expressions': {}}, 'header': {'date': datetime.datetime(2018, 6, 30, 4, 19, 48, 623932, tzinfo=tzutc()), 'msg_id': '3F664F4DF8994E9980115EFD76A70918', 'msg_type': 'execute_request', 'session': '1DCE294733F74AD0BBF17671DE4E5820', 'username': 'username', 'version': '5.2'}, 'metadata': {}, 'msg_id': '3F664F4DF8994E9980115EFD76A70918', 'msg_type': 'execute_request', 'parent_header': {}}
236 except Exception:
237 self.log.error("Exception in message handler:", exc_info=True)
238 finally:
239 self.post_handler_hook()

...........................................................................
/Users/dereksnow/anaconda/envs/py36/lib/python3.6/site-packages/ipykernel/kernelbase.py in execute_request(self=<ipykernel.ipkernel.IPythonKernel object>, stream=<zmq.eventloop.zmqstream.ZMQStream object>, ident=[b'1DCE294733F74AD0BBF17671DE4E5820'], parent={'buffers': [], 'content': {'allow_stdin': True, 'code': "# So the question is, if you only have the data ...s\npd.DataFrame(list(ugraph.edges(data='weight')))", 'silent': False, 'stop_on_error': True, 'store_history': True, 'user_expressions': {}}, 'header': {'date': datetime.datetime(2018, 6, 30, 4, 19, 48, 623932, tzinfo=tzutc()), 'msg_id': '3F664F4DF8994E9980115EFD76A70918', 'msg_type': 'execute_request', 'session': '1DCE294733F74AD0BBF17671DE4E5820', 'username': 'username', 'version': '5.2'}, 'metadata': {}, 'msg_id': '3F664F4DF8994E9980115EFD76A70918', 'msg_type': 'execute_request', 'parent_header': {}})
394 if not silent:
395 self.execution_count += 1
396 self._publish_execute_input(code, parent, self.execution_count)
397
398 reply_content = self.do_execute(code, silent, store_history,
--> 399 user_expressions, allow_stdin)
user_expressions = {}
allow_stdin = True
400
401 # Flush output before sending the reply.
402 sys.stdout.flush()
403 sys.stderr.flush()

...........................................................................
/Users/dereksnow/anaconda/envs/py36/lib/python3.6/site-packages/ipykernel/ipkernel.py in do_execute(self=<ipykernel.ipkernel.IPythonKernel object>, code="# So the question is, if you only have the data ...s\npd.DataFrame(list(ugraph.edges(data='weight')))", silent=False, store_history=True, user_expressions={}, allow_stdin=True)
191
192 self._forward_input(allow_stdin)
193
194 reply_content = {}
195 try:
--> 196 res = shell.run_cell(code, store_history=store_history, silent=silent)
res = undefined
shell.run_cell = <bound method ZMQInteractiveShell.run_cell of <ipykernel.zmqshell.ZMQInteractiveShell object>>
code = "# So the question is, if you only have the data ...s\npd.DataFrame(list(ugraph.edges(data='weight')))"
store_history = True
silent = False
197 finally:
198 self._restore_input()
199
200 if res.error_before_exec is not None:

...........................................................................
/Users/dereksnow/anaconda/envs/py36/lib/python3.6/site-packages/ipykernel/zmqshell.py in run_cell(self=<ipykernel.zmqshell.ZMQInteractiveShell object>, *args=("# So the question is, if you only have the data ...s\npd.DataFrame(list(ugraph.edges(data='weight')))",), **kwargs={'silent': False, 'store_history': True})
528 )
529 self.payload_manager.write_payload(payload)
530
531 def run_cell(self, *args, **kwargs):
532 self._last_traceback = None
--> 533 return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
self.run_cell = <bound method ZMQInteractiveShell.run_cell of <ipykernel.zmqshell.ZMQInteractiveShell object>>
args = ("# So the question is, if you only have the data ...s\npd.DataFrame(list(ugraph.edges(data='weight')))",)
kwargs = {'silent': False, 'store_history': True}
534
535 def _showtraceback(self, etype, evalue, stb):
536 # try to preserve ordering of tracebacks and print statements
537 sys.stdout.flush()

...........................................................................
/Users/dereksnow/anaconda/envs/py36/lib/python3.6/site-packages/IPython/core/interactiveshell.py in run_cell(self=<ipykernel.zmqshell.ZMQInteractiveShell object>, raw_cell="# So the question is, if you only have the data ...s\npd.DataFrame(list(ugraph.edges(data='weight')))", store_history=True, silent=False, shell_futures=True)
2723 self.displayhook.exec_result = result
2724
2725 # Execute the user code
2726 interactivity = "none" if silent else self.ast_node_interactivity
2727 has_raised = self.run_ast_nodes(code_ast.body, cell_name,
-> 2728 interactivity=interactivity, compiler=compiler, result=result)
interactivity = 'last_expr'
compiler = <IPython.core.compilerop.CachingCompiler object>
2729
2730 self.last_execution_succeeded = not has_raised
2731 self.last_execution_result = result
2732

...........................................................................
/Users/dereksnow/anaconda/envs/py36/lib/python3.6/site-packages/IPython/core/interactiveshell.py in run_ast_nodes(self=<ipykernel.zmqshell.ZMQInteractiveShell object>, nodelist=[<_ast.ImportFrom object>, <_ast.Assign object>, <_ast.Assign object>, <_ast.Assign object>, <_ast.Expr object>, <_ast.Expr object>, <_ast.Expr object>, <ast.Expr object>], cell_name='', interactivity='last', compiler=<IPython.core.compilerop.CachingCompiler object>, result=<ExecutionResult object at 111a330b8, execution...before_exec=None error_in_exec=None result=None>)
2845
2846 try:
2847 for i, node in enumerate(to_run_exec):
2848 mod = ast.Module([node])
2849 code = compiler(mod, cell_name, "exec")
-> 2850 if self.run_code(code, result):
self.run_code = <bound method InteractiveShell.run_code of <ipykernel.zmqshell.ZMQInteractiveShell object>>
code = <code object at 0x1a1f3f18a0, file "", line 8>
result = <ExecutionResult object at 111a330b8, execution
..._before_exec=None error_in_exec=None result=None>
2851 return True
2852
2853 for i, node in enumerate(to_run_interactive):
2854 mod = ast.Interactive([node])

...........................................................................
/Users/dereksnow/anaconda/envs/py36/lib/python3.6/site-packages/IPython/core/interactiveshell.py in run_code(self=<ipykernel.zmqshell.ZMQInteractiveShell object>, code_obj=<code object at 0x1a1f3f18a0, file "", line 8>, result=<ExecutionResult object at 111a330b8, execution_...before_exec=None error_in_exec=None result=None>)
2905 outflag = True # happens in more places, so it's easier as default
2906 try:
2907 try:
2908 self.hooks.pre_run_code_hook()
2909 #rprint('Running code', repr(code_obj)) # dbg
-> 2910 exec(code_obj, self.user_global_ns, self.user_ns)
code_obj = <code object at 0x1a1f3f18a0, file "", line 8>
self.user_global_ns = {'FSGNN': <class 'cdt.independence.graph.FSGNN.FSGNN'>, 'Fsgnn': <cdt.independence.graph.FSGNN.FSGNN object>, 'In': ['', '#Import libraries\nimport cdt\nfrom cdt import SET...pandas as pd\nfrom matplotlib import pyplot as plt', '#Import libraries\nimport cdt\nfrom cdt import SET...pandas as pd\nfrom matplotlib import pyplot as plt', '# Load data and graph solution\ndata = pd.read_cs...sualization of the graph. \nplt.show()\ndata.head()', 'solution', "# So the question is, if you only have the data ...s\npd.DataFrame(list(ugraph.edges(data='weight')))"], 'Out': {3: Allergy Anxiety Genetics Peer_Pressure ... -0.733240 -0.149308 0.854195 -0.633940 , 4: <networkx.classes.digraph.DiGraph object>}, 'SETTINGS': <cdt.utils.Settings.ConfigSettings object>, '
': <networkx.classes.digraph.DiGraph object>, '3': Allergy Anxiety Genetics Peer_Pressure ... -0.733240 -0.149308 0.854195 -0.633940 , '4': <networkx.classes.digraph.DiGraph object>, '': Allergy Anxiety Genetics Peer_Pressure ... -0.733240 -0.149308 0.854195 -0.633940 , '': '', ...}
self.user_ns = {'FSGNN': <class 'cdt.independence.graph.FSGNN.FSGNN'>, 'Fsgnn': <cdt.independence.graph.FSGNN.FSGNN object>, 'In': ['', '#Import libraries\nimport cdt\nfrom cdt import SET...pandas as pd\nfrom matplotlib import pyplot as plt', '#Import libraries\nimport cdt\nfrom cdt import SET...pandas as pd\nfrom matplotlib import pyplot as plt', '# Load data and graph solution\ndata = pd.read_cs...sualization of the graph. \nplt.show()\ndata.head()', 'solution', "# So the question is, if you only have the data ...s\npd.DataFrame(list(ugraph.edges(data='weight')))"], 'Out': {3: Allergy Anxiety Genetics Peer_Pressure ... -0.733240 -0.149308 0.854195 -0.633940 , 4: <networkx.classes.digraph.DiGraph object>}, 'SETTINGS': <cdt.utils.Settings.ConfigSettings object>, '
': <networkx.classes.digraph.DiGraph object>, '_3': Allergy Anxiety Genetics Peer_Pressure ... -0.733240 -0.149308 0.854195 -0.633940 , '4': <networkx.classes.digraph.DiGraph object>, '': Allergy Anxiety Genetics Peer_Pressure ... -0.733240 -0.149308 0.854195 -0.633940 , '': '', ...}
2911 finally:
2912 # Reset our crash handler in place
2913 sys.excepthook = old_excepthook
2914 except SystemExit as e:

...........................................................................
/Volumes/extra/FirmAI/Causal Inference/CausalDiscoveryToolbox-master/examples/ in ()
3 from cdt.independence.graph import FSGNN
4
5 Fsgnn = FSGNN()
6
7 start_time = time.time()
----> 8 ugraph = Fsgnn.predict(data, train_epochs=2000, test_epochs=1000, threshold=5e-4, l1=0.01)
9 print("--- Execution time : %4.4s seconds ---" % (time.time() - start_time))
10 nx.draw_networkx(ugraph, font_size=8) # The plot function allows for quick visualization of the graph.
11 plt.show()
12 # List results

...........................................................................
/Users/dereksnow/anaconda/envs/py36/lib/python3.6/site-packages/cdt/independence/graph/model.py in predict(self=<cdt.independence.graph.FSGNN.FSGNN object>, df_data= Allergy Anxiety Genetics Peer_Pressure...0.858699 -1.037579

[500 rows x 11 columns], threshold=0.0005, **kwargs={'l1': 0.01, 'test_epochs': 1000, 'train_epochs': 2000})
50 nb_jobs = kwargs.get("nb_jobs", SETTINGS.NB_JOBS)
51 list_nodes = list(df_data.columns.values)
52 if nb_jobs != 1:
53 result_feature_selection = Parallel(n_jobs=nb_jobs)(delayed(self.run_feature_selection)
54 (df_data, node, idx, **kwargs)
---> 55 for idx, node in enumerate(list_nodes))
idx = undefined
node = undefined
list_nodes = ['Allergy', 'Anxiety', 'Genetics', 'Peer_Pressure', 'Attention_Disorder', 'Smoking', 'Lung_Cancer', 'Yellow_Fingers', 'Coughing', 'Fatigue', 'Car_Accident']
56 else:
57 result_feature_selection = [self.run_feature_selection(df_data, node, idx, **kwargs) for idx, node in enumerate(list_nodes)]
58 for idx, i in enumerate(result_feature_selection):
59 try:

...........................................................................
/Users/dereksnow/anaconda/envs/py36/lib/python3.6/site-packages/joblib/parallel.py in call(self=Parallel(n_jobs=4), iterable=<generator object FeatureSelectionModel.predict..>)
784 if pre_dispatch == "all" or n_jobs == 1:
785 # The iterable was consumed all at once by the above for loop.
786 # No need to wait for async callbacks to trigger to
787 # consumption.
788 self._iterating = False
--> 789 self.retrieve()
self.retrieve = <bound method Parallel.retrieve of Parallel(n_jobs=4)>
790 # Make sure that we get a last message telling us we are done
791 elapsed_time = time.time() - self._start_time
792 self._print('Done %3i out of %3i | elapsed: %s finished',
793 (len(self._output), len(self._output),


Sub-process traceback:

AttributeError Sat Jun 30 16:19:48 2018
PID: 92373 Python 3.6.3: /Users/dereksnow/anaconda/envs/py36/bin/python
...........................................................................
/Users/dereksnow/anaconda/envs/py36/lib/python3.6/site-packages/joblib/parallel.py in call(self=<joblib.parallel.BatchedCalls object>)
126 def init(self, iterator_slice):
127 self.items = list(iterator_slice)
128 self.size = len(self.items)
129
130 def call(self):
--> 131 return [func(*args, **kwargs) for func, args, kwargs in self.items]
self.items = [(<bound method FeatureSelectionModel.run_feature
...n of <cdt.independence.graph.FSGNN.FSGNN object>>, ( Allergy Anxiety Genetics Peer_Pressure...0.858699 -1.037579

[500 rows x 11 columns], 'Allergy', 0), {'l1': 0.01, 'test_epochs': 1000, 'train_epochs': 2000})]
132
133 def len(self):
134 return self._size
135

...........................................................................
/Users/dereksnow/anaconda/envs/py36/lib/python3.6/site-packages/joblib/parallel.py in (.0=<list_iterator object>)
126 def init(self, iterator_slice):
127 self.items = list(iterator_slice)
128 self.size = len(self.items)
129
130 def call(self):
--> 131 return [func(*args, **kwargs) for func, args, kwargs in self.items]
func = <bound method FeatureSelectionModel.run_feature
...n of <cdt.independence.graph.FSGNN.FSGNN object>>
args = ( Allergy Anxiety Genetics Peer_Pressure...0.858699 -1.037579

[500 rows x 11 columns], 'Allergy', 0)
kwargs = {'l1': 0.01, 'test_epochs': 1000, 'train_epochs': 2000}
132
133 def len(self):
134 return self._size
135

...........................................................................
/Users/dereksnow/anaconda/envs/py36/lib/python3.6/site-packages/cdt/independence/graph/model.py in run_feature_selection(self=<cdt.independence.graph.FSGNN.FSGNN object>, df_data= Allergy Anxiety Genetics Peer_Pressure...0.858699 -1.037579

[500 rows x 11 columns], target='Allergy', idx=0, **kwargs={'l1': 0.01, 'test_epochs': 1000, 'train_epochs': 2000})
38 list_features = list(df_data.columns.values)
39 list_features.remove(target)
40 df_target = pd.DataFrame(df_data[target], columns=[target])
41 df_features = df_data[list_features]
42
---> 43 return self.predict_features(df_features, df_target, **kwargs)
self.predict_features = <bound method FSGNN.predict_features of <cdt.independence.graph.FSGNN.FSGNN object>>
df_features = Anxiety Genetics Peer_Pressure Attentio...0.858699 -1.037579

[500 rows x 10 columns]
df_target = Allergy
0 -0.266076
1 -0.579084
2 -0...8 -0.064685
499 -0.638704

[500 rows x 1 columns]
kwargs = {'l1': 0.01, 'test_epochs': 1000, 'train_epochs': 2000}
44
45 def predict(self, df_data, threshold=0.05, **kwargs):
46 """Get the skeleton of the graph from raw data.
47

...........................................................................
/Users/dereksnow/anaconda/envs/py36/lib/python3.6/site-packages/cdt/independence/graph/FSGNN.py in predict_features(self=<cdt.independence.graph.FSGNN.FSGNN object>, df_features= Anxiety Genetics Peer_Pressure Attentio...0.858699 -1.037579

[500 rows x 10 columns], df_target= Allergy
0 -0.266076
1 -0.579084
2 -0...8 -0.064685
499 -0.638704

[500 rows x 1 columns], nh=20, idx=0, dropout=0.0, activation_function=<class 'torch.nn.modules.activation.ReLU'>, lr=0.01, l1=0.01, train_epochs=2000, test_epochs=1000, device='cpu', verbose=False, nb_runs=3)
74 activation_function=th.nn.ReLU, lr=0.01, l1=0.1, # batch_size=-1,
75 train_epochs=1000, test_epochs=1000, device=None,
76 verbose=None, nb_runs=3):
77 """For one variable, predict its neighbours."""
78 device, verbose = SETTINGS.get_default(('device', device), ('verbose', verbose))
---> 79 x = th.FloatTensor(scale(df_features.as_matrix())).to(device)
x = undefined
df_features.as_matrix.to = undefined
device = 'cpu'
80 y = th.FloatTensor(scale(df_target.as_matrix())).to(device)
81 out = []
82 for i in range(nb_runs):
83 model = FSGNN_model([x.size()[1], nh, 1],

AttributeError: 'torch.FloatTensor' object has no attribute 'to'
___________________________________________________________________________`

CGNN Module : unexpected argument (nb_runs,nb_max_runs, etc)

When i tried to run the example, this line
Cgnn.predict(data, graph=ugraph, nb_runs=12, nb_max_runs=20, train_epochs=1500, test_epochs=1000)
Produces an unexpected argument error, is there any changes in the CGNN module that causes this? i tried investigating the module but i cant seem to figure out why this happens. Is this reproducible in your machine?

Recommended training parameters for GNN, balancing accuracy and running time

Hi Diviyan,

I tried GNN on TCEP dataset, with default parameters:

def test_pairwise_GNN():
    method = GNN
    print(method)
    m = method()
    r = m.predict_dataset(tueb)
    assert r is not None
    print(r)
    return 0

But it requires nearly half a day to test a single pair on my PC!

Could you recommend a set of parameters that run as fast as possible without much sacrifice on accuracy?

Thank you,
Abel

NCC outputs a continuous value

The NCC code is supposed to output 1 or -1 but that does not happen. I tried it on the example used in the documentation.

TCEP dataset incoherent with 'official' version?

Hi,
After I opened the issue about the labels being all set to 1, I went to check the tcep reference website to identify some pairs that got permuted and so on.

I stumbled across something strange: in the original dataset, some of the variables are multivariate. This can be seen, for example, in pair 54 or pair 71.

However, in the current version of cdt, when checking these two pairs, one finds 1D variables.

> data.iloc[53]
A    [43.51, 41.33, 36.78, -8.82, 34.61, 40.11, 12....
B    [42.0, 75.0, 69.0, 42.0, 76.0, 72.0, 77.0, 81....
Name: pair54, dtype: object
> data.iloc[53]['A']
array([ 43.51,  41.33,  36.78,  -8.82,  34.61,  40.11,  12.52, -35.18,
        48.12,  40.24,  25.4 ,  26.19,  23.71,  13.09,  53.97,  50.83,
        17.25,   6.48,  27.44, -16.3 ,  43.86, -24.66, -15.78,   4.94,
        42.71,  12.35,  -3.38,  11.54,   3.86,  45.42,   4.36,  12.11,
        49.42, -33.45,  31.14, -11.7 ,  -4.25,  -4.33,   9.92,   5.33,
        45.8 ,  23.  ,  35.17,  50.08,  55.68,  11.58,  18.48,  -0.23,
        30.06,  13.7 ,   3.75,  15.33,  59.43,   9.  ,   6.92, -18.14,
        60.17,  48.86,   4.93, -17.54,   0.39,  13.44,  41.7 ,  52.52,
         5.54,  37.97,  12.05,  16.  ,  13.47,  14.62,   9.54,  11.86,
         6.8 ,  18.54,  14.08,  22.3 ,  47.5 ,  64.14,  28.63,  -6.19,
        35.71,  33.32,  53.34,  31.79,  41.9 ,  18.  ,  35.68,  31.94,
        51.18,  -1.28,  39.02,  37.51,  29.37,  42.87,  17.97,  56.95,
        33.89, -29.3 ,   6.31,  32.88,  54.69,  49.61,  22.18,  42.  ,
       -18.92, -13.99,   3.15,   4.17,  12.65,  35.9 ,  14.6 ,  18.07,
       -20.16,  19.42,  47.91,  42.46,  33.99, -25.97,  19.74, -22.57,
        27.71,  52.37,  12.1 , -22.28, -41.29,  12.15,  13.52,   9.06,
        59.91,  23.61,  33.68,  31.88,   8.99,  -9.47, -25.3 , -12.09,
        14.58,  52.22,  38.71,  18.45,  25.29,  47.01, -20.87,  44.45,
        55.76,  27.15,  14.  , -13.83,   0.34,  24.67,  14.7 ,  44.8 ,
         8.47,   1.29,  48.21,  46.05,  -9.43,   2.04, -25.75,  40.42,
         6.92,  13.2 ,  15.63,   5.82, -26.32,  59.33,  46.95,  33.52,
        38.57,  -6.17,  13.76,  -8.57,   6.12, -21.14,  10.66,  36.81,
        39.94,  37.95,   0.31,  50.44,  24.48,  51.5 ,  38.89,  18.34,
       -34.89,  41.32, -17.74,  10.5 ,  21.03,  15.36, -15.41, -17.82])

Same can be seen about pair 71. Is this a mistake, or just a shuffling of the data?
I made sure I set shuffle=False before testing for the two pairs.
If the basic (non-shuffled) dataset is already shuffled, or has been pre-processed in some way to reduce dimensionality, can we have some explanation of how the two datasets relate to each other?

Any amount of information would help,
Thanks

NCC gives wrong prediction on TCEP?

I test NCC with half TCEP pair for training and half for testing.
When testing, I flip all pairs (X2 is cause). However, NCC outputs positive value for all pairs!
Code:

def test_NCC():
    from sklearn.model_selection import train_test_split
    tueb, labels = load_dataset('tuebingen')
    method = NCC
    print(method)
    m = method()
    X_tr, X_te, y_tr, y_te = train_test_split(tueb, labels, train_size=.5)
    m.fit(X_tr, y_tr, epochs=10000)
    r = m.predict_dataset(X_te.reindex(columns=['B', 'A']))
    print(r)

Outputs:

0       89.886803
1    42859.230469
2     2996.945312
3   351716.406250
4   218484.812500
5     1456.278320
6      354.131256
7      453.962494
8    29202.076172
9    47342.875000
10    2115.986084
11     175.141602
12    2060.776123
13   10275.829102
14    2584.913574
15    8027.451660
16     637.758789
17   49512.773438
18     794.610840
19     177.425110
20    5133.766602
21    2414.513916
22     205.962494
23     411.851135
24     186.423264
25     880.144958
26     173.254272
27      85.153000
28     758.132324
29     726.009766
30    2010.785767
31    1986.761475
32    1791.590332
33      32.296738
34    2300.482666
35   12707.833008
36   63790.007812
37    4901.006836
38     935.546875
39     232.197510
40    5229.793457
41    2120.424316
42     180.572327
43    2947.156738
44    2176.514160
45    2140.100098
46    6997.687988
47   28182.152344
48     881.467407
49    1656.368042

Mini-batch train

Hi,

I would like suggest to implement mini-batch training. Specifically, I tried to run FSGNN on my data and got the following error

not enough memory: you tried to allocate 160465GB. Buy new RAM! at /opt/conda/conda-bld/pytorch_1524584710464/work/aten/src/TH/THGeneral.c:218

I looked into the code but unfortunately I'm not familiar with GAN to modify the code.

GNN never stop even P-value < 0.01

I have a tesla GPU with cuda installed :
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.00 Driver Version: 418.87.00 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla P100-PCIE... On | 00000000:65:00.0 Off | Off |
| N/A 83C P0 43W / 250W | 1087MiB / 16280MiB | 100% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 2648 C python 1077MiB |

My question is the GNN would never stop even I the smallest P-value obtained, what is wrong with my code, please help
from cdt.independence.graph import FSGNN
Fsgnn = FSGNN(train_epochs=100, test_epochs=50, l1=0.1, batch_size=1000)

start_time = time.time()
ugraph = Fsgnn.predict(df, threshold=1e-7)
print("--- Execution time : %4.4s seconds ---" % (time.time() - start_time))
nx.draw_networkx(ugraph, font_size=8)  # The plot function allows for quick visualization of the graph.
# plt.show()
# List results
list00 = pd.DataFrame(list(ugraph.edges(data='weight')))

from cdt.causality.graph import CGNN

Cgnn = CGNN(nruns=16, train_epochs=200, test_epochs=100, batch_size=1000)
start_time = time.time()
dgraph = Cgnn.orient_undirected_graph(df, ugraph)
print("--- Execution time : %4.4s seconds ---" % (time.time() - start_time))

# Plot the output graph
nx.draw_networkx(dgraph, font_size=8)  # The plot function allows for quick visualization of the graph.
# plt.show()
# Print output results :
list22 = pd.DataFrame(list(dgraph.edges(data='weight')), columns=['Cause', 'Effect', 'Score'])
print(list22)

ANM, lack of Gamma in "Gamma HSIC"

Hi,
This might simply be a conceptual problem, or a lack of knowledge on my part.
Usually, using HSIC to compare two ANM candidates can be done by comparing the statistics directly, or by computing the related p-value.
However, to compute a p-value one needs to have some notion of the HSIC distribution under the null.
The classic paper from Gretton et al. proposes a Gamma Approximation by giving specific plug-in values for the two Gamma parameters in terms of the expectation and variance of the HSIC;
If I had to compute the p-value myself, I would use the above approximation for the gamma distribution, and then use the gamma CDF parametrized by the above values.

I am aware there might be other ways to do such a thing, however your snipper in the anm method does not seem to compute p-values, but only test statistics.
While this might be wrong, the variable names as well as the description of the method suggests this.

Am I wrong? Right? If either, how so?

Thanks for any additionnal information on this topic,
I would ideally like to design a test which detects whenever a model satisfies an ANM with low Type I and II error.

ANM score

What does the value on the ANM score indicate?
I never get a negative value even when I reverse the order to the function anm_score.

For Eg. anm_score(x1, y1) ,anm_score(y1, x1) = (1.492447882112475, 0.6205516300704923)
and anm_score(x2, y2), anm_score(y2, x2) = (1.2622033043127454, 1.9067645693295359)

What can I infer from these values?
Does this mean that x1 causes y1 since 1.49 > 0.620 and y2 causes x2 since 1.90 > 1.26 ?

Is HSIC Lasso different from KCI ?

Hello,
I was wondering whether the independence test found in the HSIC Lasso script was the one introduced in the paper Kernel-based Conditional Independence Test and Application in Causal Discovery ?
I'm asking this because they seem related (based on HSIC) but there's no citation in the code you provide, neither of the Gretton et Al. paper or the one I'm referring to.

If I had to guess, I would bet on a standard independence test, not conditional independence. Is that the case?

Thanks,
Arno V.

import error when importing cdt.causality.pairwise.ANM

When importing the ANM package from cdt.causality.pairwise, I got the following error message:" ImportError: cannot import name 'GraphLasso'".
Which is a function from the sklearn.covariance package.
I checked the sklearn documentation and the function is apparently only documented for version 0.11. I tried to pip install this older version of sklearn, but got the error message:" ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output." Which has apparently something to do with conversion from python2 to python3.
So my guess is, that GraphLasso is not supported anymore for new versions of sklearn.

I use python 3.6.8 in a virtual environment.
My Pytorch version is 1.3.1
My sklearn version is 0.22

I haven't tried running in a docker image, because I am new to docker.
Any help would be much appreciated.

Matthias

ImportError: cannot import name '__version__' from 'sklearn' (unknown location)

While attempted to import cdt, I get an Import error.

For context:
I'm on a Mac, running python 3 and the most recent versions of sklearn and cdt.

Full Traceback:

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-166-859cb185c28f> in <module>
      2 #from sklearn.gaussian_process import GaussianProcessRegressor
      3 #import networkx as nx
----> 4 import cdt

~/anaconda3/lib/python3.7/site-packages/cdt/__init__.py in <module>
     26 """
     27 
---> 28 import cdt.causality
     29 import cdt.independence
     30 import cdt.data

~/anaconda3/lib/python3.7/site-packages/cdt/causality/__init__.py in <module>
     22 .. SOFTWARE.
     23 """
---> 24 from .pairwise import __init__
     25 from .graph import __init__

~/anaconda3/lib/python3.7/site-packages/cdt/causality/pairwise/__init__.py in <module>
     22 .. SOFTWARE.
     23 """
---> 24 from .ANM import ANM
     25 from .CDS import CDS
     26 from .Jarfo import Jarfo

~/anaconda3/lib/python3.7/site-packages/cdt/causality/pairwise/ANM.py in <module>
     27 """
     28 
---> 29 from sklearn.gaussian_process import GaussianProcessRegressor
     30 from sklearn.preprocessing import scale
     31 from .model import PairwiseModel

~/anaconda3/lib/python3.7/site-packages/sklearn/gaussian_process/__init__.py in <module>
     11 """
     12 
---> 13 from .gpr import GaussianProcessRegressor
     14 from .gpc import GaussianProcessClassifier
     15 from . import kernels

~/anaconda3/lib/python3.7/site-packages/sklearn/gaussian_process/gpr.py in <module>
     12 from scipy.optimize import fmin_l_bfgs_b
     13 
---> 14 from ..base import BaseEstimator, RegressorMixin, clone
     15 from ..base import MultiOutputMixin
     16 from .kernels import RBF, ConstantKernel as C

~/anaconda3/lib/python3.7/site-packages/sklearn/base.py in <module>
     13 import numpy as np
     14 
---> 15 from . import __version__
     16 from .utils import _IS_32BIT
     17 

ImportError: cannot import name '__version__' from 'sklearn' (unknown location)

CGNN

Hey man, thanks for this, looking forward to exploring. Just a quick question regarding and error, do you know what is causing this.

image

NCC example: pytorch error

Hi,

I've been trying the NCC-example from the docs and get an error from torch:

from cdt.causality.pairwise import NCC
import networkx as nx
import matplotlib.pyplot as plt
from cdt.data import load_dataset
from sklearn.model_selection import train_test_split
data, labels = load_dataset('tuebingen')
X_tr, X_te, y_tr, y_te = train_test_split(data, labels, train_size=.5)
obj = NCC()
obj.fit(X_tr, y_tr)
Epochs: 0%| | 0/50 [00:00<?, ?it/s]
Traceback (most recent call last):
File "", line 1, in
File "/opt/conda/lib/python3.6/site-packages/cdt-0.5.5-py3.6.egg/cdt/causality/pairwise/NCC.py", line 183, in fit
for (batch, label), i in zip(da, t):
File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 529, in next
batch = self.collate_fn([self.dataset[i] for i in indices])
File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/_utils/collate.py", line 68, in default_collate
return [default_collate(samples) for samples in transposed]
File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/_utils/collate.py", line 68, in
return [default_collate(samples) for samples in transposed]
File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/_utils/collate.py", line 43, in default_collate
return torch.stack(batch, 0, out=out)
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. Got 349 and 392 in dimension 3 at /tmp/pip-req-build-l1dtn3mo/aten/src/THC/generic/THCTensorMath.cu:71

The error is almost the same from within
python 3.6.8
pytorch 1.1.0
cdt 0.5.5
or
with the nvidia-docker:0.5.5

I'm not sure if it's up to my hardware.
I get it on my notebook:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.56 Driver Version: 418.56 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| 0 GeForce 940MX Off | 00000000:02:00.0 Off | N/A |
| N/A 40C P0 N/A / N/A | 269MiB / 2004MiB | 0% Default |

and on a workstation:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.56 Driver Version: 418.56 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| 0 Quadro P600 Off | 00000000:18:00.0 Off | N/A |
| 34% 40C P8 N/A / N/A | 17MiB / 1999MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla M40 Off | 00000000:3B:00.0 Off | Off |
| N/A 56C P8 17W / 250W | 0MiB / 12215MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla M40 Off | 00000000:D8:00.0 Off | Off |
| N/A 65C P8 17W / 250W | 0MiB / 12215MiB | 0% Default |
+-------------------------------+----------------------+----------------------+

Thank you in advance for any hint.
Best
Tom

ModuleNotFoundError: No module named 'sklearn.gaussian_process'

While attempted to import cdt, I get a ModuleNotFound error.

For context:
I'm on a Mac, running python 3.

Full Traceback:

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-37-0d5686c1281b> in <module>
      1 import sklearn
      2 import networkx as nx
----> 3 import cdt

~/anaconda3/lib/python3.7/site-packages/cdt/__init__.py in <module>
     26 """
     27 
---> 28 import cdt.causality
     29 import cdt.independence
     30 import cdt.data

~/anaconda3/lib/python3.7/site-packages/cdt/causality/__init__.py in <module>
     22 .. SOFTWARE.
     23 """
---> 24 from .pairwise import __init__
     25 from .graph import __init__

~/anaconda3/lib/python3.7/site-packages/cdt/causality/pairwise/__init__.py in <module>
     22 .. SOFTWARE.
     23 """
---> 24 from .ANM import ANM
     25 from .CDS import CDS
     26 from .Jarfo import Jarfo

~/anaconda3/lib/python3.7/site-packages/cdt/causality/pairwise/ANM.py in <module>
     27 """
     28 
---> 29 from sklearn.gaussian_process import GaussianProcessRegressor
     30 from sklearn.preprocessing import scale
     31 from .model import PairwiseModel

ModuleNotFoundError: No module named 'sklearn.gaussian_process'

Error when running PC alg (some CSV file)

Hi,
As you know, I have installed most of the packages and attempted to run PC alg on sachs as well as fsgnn.
I thought I was out of trouble as the R setup looked fine, but when calling the following python3.6 snippet

from cdt.causality.graph.PC import PC

pc = PC(CItest="hsic",method_indep="rcit")
pcgraph = pc.predict(data)

The interpreter spat out the following error message:

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._setup_parser_source()

FileNotFoundError: [Errno 2] File b'/tmp/cdt_pc_0f4d2452-ff3c-4a84-81dd-73ab5b4d474b//result.csv' does not exist: b'/tmp/cdt_pc_0f4d2452-ff3c-4a84-81dd-73ab5b4d474b//result.csv'

Any idea of what this is about?

Regards,
A.V

NB: here is the full error printout

--------------------------------------------------------------------
FileNotFoundError                  Traceback (most recent call last)
<ipython-input-50-db1fca6dbddc> in <module>
----> 1 pcgraph = pc.predict(data)

~/progtools/python/virtualenvs/tfcuda/lib/python3.6/site-packages/cdt/causality/graph/model.py in predict(self, df_data, graph, **kwargs)
     61         """
     62         if graph is None:
---> 63             return self.create_graph_from_data(df_data, **kwargs)
     64         elif isinstance(graph, nx.DiGraph):
     65             return self.orient_directed_graph(df_data, graph, **kwargs)

~/progtools/python/virtualenvs/tfcuda/lib/python3.6/site-packages/cdt/causality/graph/PC.py in create_graph_from_data(self, data, **kwargs)
    257         self.arguments['{VERBOSE}'] = str(self.verbose).upper()
    258 
--> 259         results = self._run_pc(data, verbose=self.verbose)
    260 
    261         return nx.relabel_nodes(nx.DiGraph(results),

~/progtools/python/virtualenvs/tfcuda/lib/python3.6/site-packages/cdt/causality/graph/PC.py in _run_pc(self, data, fixedEdges, fixedGaps, verbose)
    300         except Exception as e:
    301             rmtree(run_dir)
--> 302             raise e
    303         except KeyboardInterrupt:
    304             rmtree(run_dir)

~/progtools/python/virtualenvs/tfcuda/lib/python3.6/site-packages/cdt/causality/graph/PC.py in _run_pc(self, data, fixedEdges, fixedGaps, verbose)
    296 
    297             pc_result = launch_R_script("{}/R_templates/pc.R".format(os.path.dirname(os.path.realpath(__file__))),
--> 298                                         self.arguments, output_function=retrieve_result, verbose=verbose)
    299         # Cleanup
    300         except Exception as e:

~/progtools/python/virtualenvs/tfcuda/lib/python3.6/site-packages/cdt/utils/R.py in launch_R_script(template, arguments, output_function, verbose, debug)
    198         if not debug:
    199             rmtree(base_dir)
--> 200         raise e
    201     except KeyboardInterrupt:
    202         if not debug:

~/progtools/python/virtualenvs/tfcuda/lib/python3.6/site-packages/cdt/utils/R.py in launch_R_script(template, arguments, output_function, verbose, debug)
    192                                            stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
    193             process.wait()
--> 194             output = output_function()
    195 
    196     # Cleaning up

~/progtools/python/virtualenvs/tfcuda/lib/python3.6/site-packages/cdt/causality/graph/PC.py in retrieve_result()
    284 
    285         def retrieve_result():
--> 286             return read_csv('{}/result.csv'.format(run_dir), delimiter=',').values
    287 
    288         try:

~/progtools/python/virtualenvs/tfcuda/lib/python3.6/site-packages/pandas/io/parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, dialect, error_bad_lines, warn_bad_lines, delim_whitespace, low_memory, memory_map, float_precision)
    683         )
    684 
--> 685         return _read(filepath_or_buffer, kwds)
    686 
    687     parser_f.__name__ = name

~/progtools/python/virtualenvs/tfcuda/lib/python3.6/site-packages/pandas/io/parsers.py in _read(filepath_or_buffer, kwds)
    455 
    456     # Create the parser.
--> 457     parser = TextFileReader(fp_or_buf, **kwds)
    458 
    459     if chunksize or iterator:

~/progtools/python/virtualenvs/tfcuda/lib/python3.6/site-packages/pandas/io/parsers.py in __init__(self, f, engine, **kwds)
    893             self.options["has_index_names"] = kwds["has_index_names"]
    894 
--> 895         self._make_engine(self.engine)
    896 
    897     def close(self):

~/progtools/python/virtualenvs/tfcuda/lib/python3.6/site-packages/pandas/io/parsers.py in _make_engine(self, engine)
   1133     def _make_engine(self, engine="c"):
   1134         if engine == "c":
-> 1135             self._engine = CParserWrapper(self.f, **self.options)
   1136         else:
   1137             if engine == "python":

~/progtools/python/virtualenvs/tfcuda/lib/python3.6/site-packages/pandas/io/parsers.py in __init__(self, src, **kwds)
   1915         kwds["usecols"] = self.usecols
   1916 
-> 1917         self._reader = parsers.TextReader(src, **kwds)
   1918         self.unnamed_cols = self._reader.unnamed_cols
   1919 

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.__cinit__()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._setup_parser_source()

FileNotFoundError: [Errno 2] File b'/tmp/cdt_pc_0f4d2452-ff3c-4a84-81dd-73ab5b4d474b//result.csv' does not exist: b'/tmp/cdt_pc_0f4d2452-ff3c-4a84-81dd-73ab5b4d474b//result.csv'

Potential Bug for GNN when computing the loss function

Hi,

I just read through the CGNN code, mainly interested in the pairwise version.

It looks like the criterion in computing the MMD(y, y_pred)
https://github.com/FenTechSolutions/CausalDiscoveryToolbox/blob/32200779ab9b63762be3a24a2147cff09ba2bb72/cdt/causality/pairwise/GNN.py#L111

However, in the original paper the compute the MMD([x, y], [x, y_pred])
https://github.com/GoudetOlivier/CGNN/blob/e3fcfc570e30fb8dad8bf00f619ef3c21998bb90/Code/cgnn/GNN.py#L70

Thanks a lot for the repo and the reply. Helped me understand a lot of new things.

ValueError when running VarLiNGAM

When I run VarLiNGAM on Finance dataset (http://www.skleinberg.org/data/FinanceCPT.tar.gz), I meet a ValueError.
df_data = pd.read_csv(datafile) model = VarLiNGAM(lag=3) result = model.create_graph_from_data(df_data)
The error is as follows

File "", line 1, in
runfile('F:/work_python_d_2/TimeSeriesCausalDiscovery/VARLiNGAM/VARLiNGAM_Finance.py', wdir='F:/work_python_d_2/TimeSeriesCausalDiscovery/VARLiNGAM')

File "D:_work\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 827, in runfile
execfile(filename, namespace)

File "D:_work\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 110, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)

File "F:/work_python_d_2/TimeSeriesCausalDiscovery/VARLiNGAM/VARLiNGAM_Finance.py", line 31, in
G = model.create_graph_from_data(df_data)

File "D:_work\Anaconda3\lib\site-packages\cdt\timeseries\graph\VARLiNGAM.py", line 75, in create_graph_from_data
inst, lagged = self._run_varLiNGAM(data.values, verbose=self.verbose)

File "D:_work\Anaconda3\lib\site-packages\cdt\timeseries\graph\VARLiNGAM.py", line 109, in run_varLiNGAM
Bhat
= np.dot((Ident - Bo_), Mt_)

ValueError: shapes (25,25) and (75,25) not aligned: 25 (dim 1) != 75 (dim 0)

issue in tutorial

print(nx.adj_matrix(output_graph)).todense() does not work. It requires

print(nx.adjacency_matrix(output_graph)).todense()

instead

CGNN results question

Hi,

So I have tried to run the experiments again for the CGNN pairwise experiments.

And I can confirm to get the same results for the Multi, Gauss, Net, Tueb datasets in terms of AUPRC (using 12 different runs to ensemble)
AUPR: 0.95 MULTI
AUPR: 0.80 GAUSS
AUPR: 0.90 NET

However when I look at the acc ie. predicting the actual direction I get:
0.43, 0.46, 0.49 respectively.

I compute the acc by the score

for dataset_name in ['multi', 'gauss', 'net']:
	data, labels = load_dataset(dataset_name)
	res = genfromtxt('results/res2_{}.csv'.format(dataset_name), delimiter=',', skip_header=True)
	idx = 0
	acc = 0
	labels = labels.to_numpy()
	for data_ in res[:, 1]:
		if data_ < 0 and labels[idx] == -1:
			acc += 1
                        idx += 1 # EDIT
		elif data_ > 0 and labels[idx] == 1:
			acc += 1
                        idx += 1 # EDIT
		else:
			idx += 1

	acc /= (res.shape[0]-1)
	print(res.shape[0])
	print('{} ACC : {}'.format(acc, dataset_name))
	aupr, curve = precision_recall(labels[:res.shape[0]], res[:, 1])
	print('AUPR: {}'.format(aupr))

This method also gives me around 74% unweighted on Tueb dataset.

So my question is whether this is expected or whether i should be computing the acc differently or maybe even the ACC doesnt matter?

Thanks for the clarification in advance.

Best

Failed to import cdt on a machine without GPU.

I am using Windows 7 and I don't have a GPU.

I create a fresh conda environment as follows.

conda create --prefix venv python=3.7
conda activate ./venv
# Note that I don't have a GPU. https://pytorch.org/get-started/locally/ 
conda install pytorch torchvision cpuonly -c pytorch
conda install pip
pip install cdt

Packages:

  • pytorch: 1.2.0
  • torchversion: 0.4.0
  • cdt: 0.5.8
  • ...

Executing import cdt failed.
image

The following lines caused the problem. https://github.com/Diviyan-Kalainathan/CausalDiscoveryToolbox/blob/6df55f4ec0800a377cb4688b7eaedb2b1f75f3fe/cdt/utils/Settings.py#L161-L163

My workaround is executing os.environ["CUDA_VISIBLE_DEVICES"]="[]" before executing import cdt.

problem about the SAM model

Hi, thanks for your great work, I have tried the SAM model on a 90 variables graph estimation.
However, after an hour, I found the loss become to nan (It's NOT always nan, sometimes seemed regularly)

Is that no problem? and need I change the lr or any parameters? (I just use the default parameters)

commond:
obj = SAM(gpus=3, njobs=6,nruns=16,batchsize=1024) output = obj.predict(data)

output:
397/11000 [2:52:25<76:44:58, 26.06s/it, disc=nan, gen=nan, regul_loss=nan, tot=nan]
402/11000 [2:52:16<75:41:45, 25.71s/it, disc=6.07, gen=-0.992, regul_loss=0.49, tot=-78.9]

FSGNN strange matrix multiplication

Hello,
I played with the FSGNN example avalaible, modifying very little pieces of the code (most of it, especially the NN-related part, is the same as the original).
However after training, the whole thing crashes because of some matrix multiplication.
You can find a screen capture of the error message (inside the jupyter notebook) below.
Screenshot from 2019-10-01 22-27-26

I think you meant to write matrix_results = matrix_results.dot(matrix_results.T) or something like that? ( or matrix_results = matrix_results @ matrix_results.T works too i believe); it would only make sense, as (2,11) (11,2) (2,11) is a valid matrix multiplication operation..
Maybe A*B is now performing an element-wise product ?

Regards,
Arno V.

FileNotFoundError

When I'm trying to run some examples with different parameters, I get this error:

FileNotFoundError: File b'/tmp/cdt_CAMbc4bbf1c-b80b-4e8b-9184-23ba73222cce/result.csv' does not exist

Here is the snippet of code I'm trying to run

import networkx as nx
from cdt.causality.graph import CAM
from cdt.data import load_dataset
data, graph = load_dataset("sachs")
obj = CAM(selmethod='gam')
output = obj.predict(data)

Fix examples/Discovery_LUCAS.ipynb

Cgnn.predict(data, graph=ugraph, nb_runs=16, train_epochs=1500, test_epochs=1000) CGNN predict function doesn't accept nb_runs, train_epochs and test_epochs anymore. It has to be called like this:

Cgnn = CGNN(nb_runs=16, train_epochs=1500, test_epochs=1000)
Cgnn.predict(data, graph=ugraph)

Issue with running RScript in Mac

The subprocess.call function does not recognise RScript. You need to provide full path (e.g. /usr/local/bin/Rscript) in R.py in order to resolve.

Sign of the IGCI score

Hi Diviyan,

I think the sign of the IGCI score is wrong.

In the UAI 2010 paper, on page 5, at the beginning of sec 3.5, Cx2y = S(Py) - S(Px) after preprocessing. And on page 3, after Postulate 2, the paper says negative value of Cx2y indicates X cause Y.

Thus, in method predict_proba of class IGCI, line 105, the entropy estimator should be eval_entropy(x) - eval_entropy(y) (replace x and y). Now, positve return value indicates X cause Y.

Running the code on CEP dataset, the result also agrees with the above.

Best,
Abel

NCC.py

It is written in the code that it outputs 1 or -1 but sigmoid has been used which outputs between 0 and 1. So data with which type of labels is required? Please solve this issue.

Is Graphviz needed when using R's pcalg?

My question is related to these explanations about the pcalg package in R.
I am currently installing the required packages to run CDT's graph-related PC.py.
As the plotting is done using networkx, I figured Graphviz would not be needed.

Am I guessing correctly?
Is there anything more I should know about these R requirements (say, about the path to the packages or some environment variables..)

Thanks,
A.V

Reproducing results on Teubingen

I ran GNN with default parameters on Tub dataset with 10 epochs for train and 10 for test and nb_max_runs=5. Got AUC 54 (in https://arxiv.org/pdf/1711.08936.pdf it is specified that I should get a higher score). Am I doing something wrong?

Note: running 1000 epochs is infeasible, since it already takes more than 5 hours to run it with 10 epochs.

Thanks!!

NCC code not coherent with the paper?

Hi,
This could be obviously due to misunderstandings, but I thought the NCC framework should be built from 2 MLPs. From my understanding, MLPs typically do not use convolutions.
However, the RCC code I found in CDT is the following

        self.conv = th.nn.Sequential(th.nn.Conv1d(2, n_hiddens, kernel_size),
                                     th.nn.ReLU(),
                                     th.nn.Conv1d(n_hiddens, n_hiddens,
                                                  kernel_size),
                                     th.nn.ReLU())

In addition, the original paper recommends to enforce 1 - NCC(x1,x2) = NCC(x2,x1) (where [x1,x2] is a n-sample of pairs. They do this by having a composite output .5*(1- NCC(x2,x1) + NCC(x1,x2))
I am not sure which lines are responsible for this if they exist?

Any help would be greatly appreciated.
Regards,
Arno V.

NCC got random guess performance on TCEP

I test NCC with half TCEP pairs for training and half for testing, and randomly split training and testing for 100 times.
Code:

tueb, labels = load_tuebingen(shuffle=True)

def test_NCC():
    from sklearn.model_selection import train_test_split
    method = NCC
    print(method)
    m = method()

    accs = []
    for n in range(100):
        X_tr, X_te, y_tr, y_te = train_test_split(tueb, labels, train_size=.5)
        m.fit(X_tr, y_tr, epochs=10000)
        r = m.predict_dataset(X_te)
        acc = np.mean(r.values*y_te.values > 0)
        accs.append(acc)
        print(acc, file=open('ncc_.txt', 'a'))
    print(np.mean(accs), np.std(accs), file=open('ncc_.txt', 'a'))

The average acc of ~60 times is 50.03%.

A first image is overfitting. But I am also running with epochs=500, and there seems no big difference (although the training accs are less like overfitting).

Thank you,
Abel

Higher dimension data support

Hi,

I just started to look into your work and really like it.

I started out with the LUCAS example, and wonder if you have any plan to support high dimension features?

For example: I have a data set where feature 1 is an array of length L but feature 2 is just a single number.

Thanks for the great work, and thanks for using Pytorch.

Cheers,

Weights for TCEP dataset

Hi,
Many different papers related to bivariate causal discovery discuss the necessity of attaching a weight to each pair to account for the fact they come from the same joint distribution.

I do not see this as an option currently in CDT.

Would this possibly be an option in later releases? :)

Thanks!

CDT GPU Util: When using several Jupyter Notebooks, some do not detect CUDA.

Hello,
I have been relying on Jupyter and CDT to experiment the past few weeks.
Several times, some unwanted behavior has manifested itself:

  • Trying algorithm one with CUDA and visualizing results, then opening a new notebook to test a new algorithm, the new notebook's CDT instance did not detect GPU (in this case shutting down the notebook that detected CUDA before was enough to solve the problem)
  • Making sure all other notebooks were shut down a few days later, I experiment again. For some reason, after modifying the code and pressing "Restart & Clear Output", the same notebook that could detect my only one CUDA GPU did not.
  • After closing the entire jupyter server and launching again a single notebook file, I could then use CDT with CUDA successfully.

Is this behavior expected? Could the new notebook take priority over the old ones?
Here are screenshots of the same notebook before and after restarting the Jupyter server.
screen_nocuda
screen_cuda

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.