Giter Club home page Giter Club logo

Comments (5)

EscVM avatar EscVM commented on August 25, 2024

Hi @pankhurivanjani!

mmmm here something is really going bad :)

Could you give more information on what you've done and what you're trying to achieve? It really seems that the Checkpoint class is not able to resolve some objects.

from rams.

adv010 avatar adv010 commented on August 25, 2024

Hi @EscVM,
We're not quite sure either. We've been running multiple experiments, trying to figure out what could be going wrong.
And to further complicate, we don't have direct GPU access, we are having to submit our jobs through HTCondor, a High Performance pipeline for submitting our job to GPU and then observe the errors only via logs.

About 2 days back, we had tried training your model without changing the code for 50 epochs. But for some reason, there was no new checkpoint created. We weren't quite sure about this.

Late yesterday, we tried running only 3 epochs, but we changed the value passed to evaluate_every inside trainer_rams.fit() function from 400 to 1. This resulted in getting a checkpoint file. Upon executing test using this modified checkpoint, we obtained a prediction and it didn't give any of these warnings.

Would you know anything around why this could be happening? We are not able to understand if it's due to the evaluate_every parameter or something else.

from rams.

adv010 avatar adv010 commented on August 25, 2024

Forgot to add the image, this is our result obtained after 3 epochs and evaluate_every =1
Sample_image_after3epochs

from rams.

EscVM avatar EscVM commented on August 25, 2024

As you can check in line 165 of training.py script, evaluate_every simply triggers the test procedure. Moreover, if the test PSNR is lower than the previous test, the code saves the network's weights. That's it.

Why don't you iterate with the code with a Colab session? You could quickly check if you're missing something.

from rams.

robmarkcole avatar robmarkcole commented on August 25, 2024

I don't mean to hijack this issue, but I noted another issue where the user is not using a GPU even. I can confirm Google Colab can be used for training, with 25 epochs taking approx 4-5 hours

image

from rams.

Related Issues (15)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.