Giter Club home page Giter Club logo

Comments (4)

kwotsin avatar kwotsin commented on May 26, 2024

Hi @ttaa9 , thanks for your questions! The training splits follow the splits available for the datasets. For example, for CIFAR-10 there are train and test splits, but similar to many works I used train split instead. On the other hand, for STL-10, there is the unlabeled version from the dataset, which is also most commonly used. For the scores, they are obtained from a single training run and no retraining of the model to find the best one was done.

from mimicry.

ttaa9 avatar ttaa9 commented on May 26, 2024

Hi @kwotsin , thanks so much for the quick reply. It might be useful to put this info about splits somewhere in the readme so we know which splits to test against to compare against your scores.

Also, the reason I asked about multiple runs is that when I run the GANs myself I don't get quite the same scores. E.g., on celeb-A (128 x 128), I get FID/KID of 13.08/0.00956 (versus your 12.93/0.0076) when training on the train set and evaluating on the test set. (It's a bit worse, 13.39/0.010, when evaluating on the training set, unsurprisingly). FID is quite close but KID is a bit off. So I am wondering whether this is simply stochasticity across training runs versus a difference in the training settings. Perhaps you could post your Trainer settings/object as well as just the architectures, which you currently have?

from mimicry.

kwotsin avatar kwotsin commented on May 26, 2024

Hi @ttaa9 , no worries! Indeed, on the splits, the information is currently listed under the "Baselines", which is also available for all other datasets tested. To clarify, similar to many existing works, the same split was used for both training and evaluation for each dataset. On the training settings and architectures, these are listed on the README page as well, which are the same ones used for the checkpoint.

On the CelebA run, I think your obtained FID score looks correct, with the difference quite similar to the error interval (which as you mentioned, is probably due to the stochasticity across different training runs). For the KID score, could you check and see if the JSON file scores have any anomalous scores? For example, my current JSON file for the KID scores have the following values:

[
    0.007495319259681859,
    0.007711712250735898,
    0.007619357938282523
]

I suspect an anomalous reading could affect the KID score significantly, which is not surprising since I noticed it can happen even for FID -- e.g. at the same checkpoint, generating using a different random seed can sometimes give few hundred FID points instead of the 20+ points from other readings, although this is very rare. I've re-run the evaluation with the given checkpoint and have gotten a similar score as well: 0.007659641506459136 (± 7.556746387021168e-06). Given that your obtained FID is similar to the one I got, I suspect the KID score might have an anomaly for one of the readings.

To reproduce the scores for KID CelebA, you can download the checkpoint file and run this minimal script:

import torch
import torch_mimicry as mmc
from torch_mimicry.nets import sngan

# Replace with checkpoint file from CelebA 128x128, SNGAN. https://drive.google.com/open?id=1rYnv2tCADbzljYlnc8Ypy-JTTipJlRyN
ckpt_file = "/path/to/checkpoints/netG/netG_100000_steps.pth"

# Default variables
log_dir = './examples/example_log_celeba'
dataset = 'celeba_128'
device = torch.device('cuda:0' if torch.cuda.is_available() else "cpu")

# Restore model
netG = sngan.SNGANGenerator128().to(device)
netG.restore_checkpoint(ckpt_file)

# Metrics
scores = []
for seed in range(3):
    score = mmc.metrics.kid_score(num_samples=50000,
                                  netG=netG,
                                  seed=seed,
                                  dataset=dataset,
                                  log_dir=log_dir,
                                  device=device)

    scores.append(score)

print(scores)

Feel free to let me know if this is helpful!

from mimicry.

kwotsin avatar kwotsin commented on May 26, 2024

Closing this issue for now, but feel free to let me know if you have more questions!

from mimicry.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.