timelovercc / caf-gnn Goto Github PK

View Code? Open in Web Editor NEW

11.0 11.0 3.0 16.21 MB

[CIKM 2023] Towards Fair Graph Neural Networks via Graph Counterfactual.

Home Page: https://arxiv.org/abs/2307.04937

License: MIT License

Python 68.43% Shell 1.87% Jupyter Notebook 29.71%

causal-inference counterfactual deep-learning graph-neural-networks pytorch

caf-gnn's Introduction

Hi there 👋

🌱 I’m learning machine learning.

caf-gnn's People

Contributors

Stargazers

Watchers

Forkers

408262260 vinnyshin voidreaming

caf-gnn's Issues

Two Module Missing in setup.sh & A Little Mistake in README.md

Hi, thanks for the great work ! Yet two issues occur when I'm reproducing it.

1. Two Module Missing in setup.sh

To run the code, Module Pandas and Rich are also needed. However they are not in the setup.sh

2. A Little Mistake in README.md

in this line the dataset_name should be 'bail' since the line above say so.

Thanks again for the great work !

Request for details for counterfacutal fairness calculation implementation

Thanks for your timely reply!
In GEAR's evaluation part, they generate subgraphs for each node to calculate the mean cf. Do you also implement with generating subgraphs when derive cf metric or simply follows the definition with something like cf = 1 - (np.sum(y_pred_cf == y_pred) / n):

# For convenience, attached is code for how GEAR evaluate cf with subgraphs
def evaluate(model, data, subgraph, cf_subgraph_list, labels, sens, idx_select, type='all'):
    loss_result = compute_loss(model, subgraph, cf_subgraph_list, labels, idx_select)
    if type == 'easy':
        eval_results = {'loss': loss_result['loss'], 'loss_c': loss_result['loss_c'], 'loss_s': loss_result['loss_s']}

    elif type == 'all':
        n = len(labels)
        idx_select_mask = (torch.zeros(n).scatter_(0, idx_select, 1) > 0)  # size = n, bool

        # performance
        emb = get_all_node_emb(model, idx_select_mask, subgraph, n)
        output = model.predict(emb)
        output_preds = (output.squeeze() > 0).type_as(labels)

        auc_roc = roc_auc_score(labels.cpu().numpy()[idx_select], output.detach().cpu().numpy())
        f1_s = f1_score(labels[idx_select].cpu().numpy(), output_preds.cpu().numpy())
        acc = accuracy_score(labels[idx_select].cpu().numpy(), output_preds.cpu().numpy())

        # fairness
        parity, equality = fair_metric(output_preds.cpu().numpy(), labels[idx_select].cpu().numpy(),
                                       sens[idx_select].numpy())
        # counterfactual fairness
        cf = 0.0
        for si in range(len(cf_subgraph_list)):
            cf_subgraph = cf_subgraph_list[si]
            emb_cf = get_all_node_emb(model, idx_select_mask, cf_subgraph, n)
            output_cf = model.predict(emb_cf)
            output_preds_cf = (output_cf.squeeze() > 0).type_as(labels)
            cf_si = 1 - (output_preds.eq(output_preds_cf).sum().item() / idx_select.shape[0])
            cf += cf_si
        cf /= len(cf_subgraph_list)

        eval_results = {'acc': acc, 'auc': auc_roc, 'f1': f1_s, 'parity': parity, 'equality': equality, 'cf': cf,
                        'loss': loss_result['loss'], 'loss_c': loss_result['loss_c'], 'loss_s': loss_result['loss_s']}  # counterfactual_fairness
    return eval_results

An Error Occurs when reproducing CAF & A question about Data Normalization

Hi.

1, An Error Occurs when reproducing CAF

An error occurs when I reproduce CAF, in line trainer.fit(model, datamodule=data_module):

RuntimeError: It looks like your LightningModule has parameters that were not used in producing the loss returned by training_step. If this is intentional, you must enable the detection of unused parameters in DDP, either by setting the string value strategy='ddp_find_unused_parameters_true' or by setting the flag in the strategy with strategy=DDPStrategy(find_unused_parameters=True).

2. A question about Data Normalization

And I think this line self.data.x[:, sens_idx] = self.data.sens does not re-asign the sensitive value $s$ to $s\in \lbrace0,1 \rbrace$. Since this is done before the normalization.
To be specific, in the implementation of torch_geometric.data.Dataset,

The data object will be transformed before every access

which means that:

when executing dataset = Bail(...,transform=NormalizeFeatures()), features dataset[0].x are not normalized.
when executing data = dataset[0] (i.e. accessing data object), features data.x are implicitly normalized.

Yet this line self.data.x[:, sens_idx] = self.data.sens is executed in dataset = Bail(...,transform=NormalizeFeatures()), in other words, the re-asigning of sensitive values are executed beform feature normalization.
And you use Row-Normalization (torch_geometric.transforms.NormalizeFeatures) in your code, resulting in a variety values of $s$: $s \in (0,1)$. e.g. might be $0.18$, $0.23$ depending on other features' values of this individual.

Some Issues in Data Processing Modules

Hi, I notice some issues in all three files './src/datasets/bail.py&credit.py&german.py'.
Take 'bail.py' as an example:

Line13 self.load(self.processed_paths[0])
Maybe you mean self.data = torch.load(self.processed_paths[0])[0]?
Line14 sens_idx = 1
I think the sens_idx for bail is supposed to be 0.
Line15 self.data.x[:, sens_idx] = self.data.sens
I don't understand what this line does, since self.data.x[:, sens_idx] always equals to self.data.sens
Line31 self.save([data], self.processed_paths[0])
Maybe you mean torch.save([data], self.processed_paths[0])?

Request code for reproducing Countertactual Fairness Metric for Synthetic dataset

Hi
Could you provide the code or method for generating the counterfactual fairness metrics that may yield the data in Table 3 and Figure 3 in the original paper? Is it aligned with the method with that in GEAR? Thanks a lot!

timelovercc / caf-gnn Goto Github PK

caf-gnn's Introduction

Hi there 👋

caf-gnn's People

Contributors

Stargazers

Watchers

Forkers

caf-gnn's Issues

Two Module Missing in setup.sh & A Little Mistake in README.md

1. Two Module Missing in setup.sh

2. A Little Mistake in README.md

Request for details for counterfacutal fairness calculation implementation

An Error Occurs when reproducing CAF & A question about Data Normalization

1, An Error Occurs when reproducing CAF

2. A question about Data Normalization

Some Issues in Data Processing Modules

Request code for reproducing Countertactual Fairness Metric for Synthetic dataset

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent