I have fixed some small bugs mentioned on the before issue and run the program. Howeve

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

thanks, sorry I did not see <a class="issue-link js-issue-link" data-error-text="Faile

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Cannot reproduce results in the paper about diffpool HOT 10 OPEN

rexying commented on August 25, 2024

Cannot reproduce results in the paper

from diffpool.

Comments (10)

Waterpine commented on August 25, 2024 3

Thanks for your reply! I have modified the hyper-parameters. And I have run the main function in train.py - benchmark_task_val() for sevaral times. The result of DD and enzymes have improved. However, the result of DD is 79.52% and the result of enzymes is 56.78% which are lower than the results which you showed on the paper. What's more, I think you choose the max validation performance overall training iterations as evaluation method is incorrect.

from diffpool.

RexYing commented on August 25, 2024

Hi,

Although there is some variation in results, 47.74% for Enzymes seem way too low. Not sure what happened to your run as I retried and got the results that I reported, without any tuning.
You can use, say any hidden-dim and output-dim between 30-64, assign-ratio=0.25 or 0.1 etc., optionally add --linkpred, --dropout, and you should be able to get 60%+ with all these options.

Also the main function in train.py calls benchmark_task_val, as described in paper.

Rex

from diffpool.

Waterpine commented on August 25, 2024

Thanks for your reply! I have run the main function in train.py - benchmark_task_val(), but the max validation performance (there is no test set performance) overall training iterations enzymes are 43.55% and the result of DD is 78.02%. And hyper-parameters is what you showed in the source code. Could you provide a source code should contain a script(including hyper-parameters) that reproduces the result in the paper? Thanks!

from diffpool.

RexYing commented on August 25, 2024

python -m train --bmname=ENZYMES --assign-ratio=0.1 --hidden-dim=30 --output-dim=30 --cuda=1 --num-classes=6 --method=soft-assign

Got 63.7%

many other configs are possible

from diffpool.

RexYing commented on August 25, 2024

As I said I'm confused about what has been tuned. The command I posted gave much higher results, as mentioned.
ENZYMES gets 60+ even without any tuning. In general you don't even need to tune to get the results.
Maybe you can try https://github.com/rusty1s/pytorch_geometric/tree/master/examples diffpool there.
They should give similar results.

The val acc was consistent across all experiments, and has been adopted by GIN etc.. This is mainly due to small dataset size for some of the datasets. You can of course do other test acc experiments, but just need to make sure you are consistent in eval.

from diffpool.

meltzerpete commented on August 25, 2024

Hi @RexYing . I love your paper, I think this is a really cool method.. just wanted to query about how you measure performance for benchmarking.

Could you please clarify the process used here? As far as I can see, it goes as follows:

10-fold cross-validation
for each fold record the the best validation score
keep best validation accuracy from each trial and report the mean of these (although in the code it actually looks like reporting the max of the mean val acc for each fold?)

Is my understanding correct? or do you also use a separate test set for each fold based on the val scores?

from diffpool.

RexYing commented on August 25, 2024

Hi, your understanding is correct. Max of the mean is used and I didn't specify test in code.
Maybe refer to #17 for a bit more detail?

from diffpool.

meltzerpete commented on August 25, 2024

thanks, sorry I did not see #17 - this has answered my question exactly!

from diffpool.

Livetrack commented on August 25, 2024

Hi @RexYing ,
I am trying to run your code with the script provided in example.sh but like the OP I get results that do not match the paper. Sometimes, I get 0.48%, sometimes, 0.56% for the test accuracy (I am running benchmark_task and not benchmark_task_val so that I can see the test accuracy. There is no test set in benchmark_test_val).
Do you have a way to solve this problem?

from diffpool.

RexYing commented on August 25, 2024

Hi, the accuracy reported is the mean of the validation accuracy over 10 cross validation runs. All baselines are run with hyperparam search and the same metric as well.

from diffpool.

Cannot reproduce results in the paper about diffpool HOT 10 OPEN

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent