Giter Club home page Giter Club logo

Comments (10)

Waterpine avatar Waterpine commented on August 25, 2024 3

Thanks for your reply! I have modified the hyper-parameters. And I have run the main function in train.py - benchmark_task_val() for sevaral times. The result of DD and enzymes have improved. However, the result of DD is 79.52% and the result of enzymes is 56.78% which are lower than the results which you showed on the paper. What's more, I think you choose the max validation performance overall training iterations as evaluation method is incorrect.

from diffpool.

RexYing avatar RexYing commented on August 25, 2024

Hi,

Although there is some variation in results, 47.74% for Enzymes seem way too low. Not sure what happened to your run as I retried and got the results that I reported, without any tuning.
You can use, say any hidden-dim and output-dim between 30-64, assign-ratio=0.25 or 0.1 etc., optionally add --linkpred, --dropout, and you should be able to get 60%+ with all these options.

Also the main function in train.py calls benchmark_task_val, as described in paper.

Rex

from diffpool.

Waterpine avatar Waterpine commented on August 25, 2024

Thanks for your reply! I have run the main function in train.py - benchmark_task_val(), but the max validation performance (there is no test set performance) overall training iterations enzymes are 43.55% and the result of DD is 78.02%. And hyper-parameters is what you showed in the source code. Could you provide a source code should contain a script(including hyper-parameters) that reproduces the result in the paper? Thanks!

from diffpool.

RexYing avatar RexYing commented on August 25, 2024

python -m train --bmname=ENZYMES --assign-ratio=0.1 --hidden-dim=30 --output-dim=30 --cuda=1 --num-classes=6 --method=soft-assign

Got 63.7%

many other configs are possible

from diffpool.

RexYing avatar RexYing commented on August 25, 2024

As I said I'm confused about what has been tuned. The command I posted gave much higher results, as mentioned.
ENZYMES gets 60+ even without any tuning. In general you don't even need to tune to get the results.
Maybe you can try https://github.com/rusty1s/pytorch_geometric/tree/master/examples diffpool there.
They should give similar results.

The val acc was consistent across all experiments, and has been adopted by GIN etc.. This is mainly due to small dataset size for some of the datasets. You can of course do other test acc experiments, but just need to make sure you are consistent in eval.

from diffpool.

meltzerpete avatar meltzerpete commented on August 25, 2024

Hi @RexYing . I love your paper, I think this is a really cool method.. just wanted to query about how you measure performance for benchmarking.

Could you please clarify the process used here? As far as I can see, it goes as follows:

  • 10-fold cross-validation
  • for each fold record the the best validation score
  • keep best validation accuracy from each trial and report the mean of these (although in the code it actually looks like reporting the max of the mean val acc for each fold?)

Is my understanding correct? or do you also use a separate test set for each fold based on the val scores?

from diffpool.

RexYing avatar RexYing commented on August 25, 2024

Hi, your understanding is correct. Max of the mean is used and I didn't specify test in code.
Maybe refer to #17 for a bit more detail?

from diffpool.

meltzerpete avatar meltzerpete commented on August 25, 2024

thanks, sorry I did not see #17 - this has answered my question exactly!

from diffpool.

Livetrack avatar Livetrack commented on August 25, 2024

Hi @RexYing ,
I am trying to run your code with the script provided in example.sh but like the OP I get results that do not match the paper. Sometimes, I get 0.48%, sometimes, 0.56% for the test accuracy (I am running benchmark_task and not benchmark_task_val so that I can see the test accuracy. There is no test set in benchmark_test_val).
Do you have a way to solve this problem?

from diffpool.

RexYing avatar RexYing commented on August 25, 2024

Hi, the accuracy reported is the mean of the validation accuracy over 10 cross validation runs. All baselines are run with hyperparam search and the same metric as well.

from diffpool.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.