Giter Club home page Giter Club logo

Comments (10)

ninamiolane avatar ninamiolane commented on May 18, 2024 1

Excellent, thanks for the very detailed diagnosis. I agree with all your points and the solutions.

iv. I like the naive solution of reducing the number of epochs from 5 to 1, together with a comment in the text explaining that in real applications that number should be increased.
@devendragovil could you do this?

i-iii. These are awesome solutions, but would take more time. Maybe we can deprioritize them for now? (there are a lot of other tasks remaining).

from topomodelx.

devendragovil avatar devendragovil commented on May 18, 2024 1

@ninamiolane
yes I can do this. I can also implement the 3rd solution as well, I was independently working on the same for some time, and should hopefully be able to do it by Sunday. Will that work if I implement the 3rd solution by Sunday?

from topomodelx.

devendragovil avatar devendragovil commented on May 18, 2024 1

Independently of this issue, I also wanted to know if Sunday is a reasonable target to resolve all (or most in case of getting totally stuck in an issue) issues assigned to me?

from topomodelx.

ninamiolane avatar ninamiolane commented on May 18, 2024 1

Even better if you can do iii as well, thanks for offering!

Sunday is a perfect target of deadline 💯 Thanks for your great and fast work.

from topomodelx.

devendragovil avatar devendragovil commented on May 18, 2024

@ninamiolane

Analysis

I have analyzed the runtime for all unit tests. Hypergraph Tutorials indeed do take the longest durations. Please find the times of the longest 5 tests here:

Category Name Run Time (sec)
Hypergraph DHGCN. 208
Hypergraph Hypersage 176
Hypergraph UniGCNII. 81
Hypergraph UniGCN 42
Simplicial Scone 27

My observations:

  1. Individual test times are not that outrageous.
  2. It takes really long because all the tests are running sequentially

Deep Dive (DHGCN Tutorial)

All steps are taking reasonable amount of time (< 5 secs) except the last step which is a 5 epoch training run for the DHGCN Hypergraph TNN.

image

Observations

  1. Individual train times do seem reasonable to me (please correct me if I am wrong). These might speed up with GPU access
  2. Environment is built repeatedly. For tutorials the libraries are imported repeatedly.
  3. All tests (until recently) were being run sequentially.

Recommendations/Solutions

  1. We can arrange for GPU for the test-suite. I don't think Github actions provides a runner with GPU, we will need to arrange our own runner, which can be configured. However, configuration might be time consuming and hosting a GPU instance might be costly.
  2. We can do aggressive caching for our environment as well as libraries being imported.
  3. We can run tests in parallel. There are many libraries like pytest-xdist and pytest-split that enable this. Since Github Actions runners are single core, we can use the matrix strategy for parallelization. The tests can also be split based on the time they take to enable 5-7 (or as required) equally timed partitions. Since the longest test takes just over 3 minutes, that is the shortest time parallelization can achieve without making changes in tutorials themselves.
  4. A Naive Solution: Reducing number of epochs in tutorials. Reducing number of epochs in DHGCN from 5 to 1 reduces the time by a fifth, and DHGCN tutorial concludes within a minute.

from topomodelx.

devendragovil avatar devendragovil commented on May 18, 2024

Thanks a lot!

from topomodelx.

devendragovil avatar devendragovil commented on May 18, 2024

@ninamiolane I fell ill after my travel back from India last week, so couldn't meet the timeline that I gave earlier. Sorry for that! I will try to complete all the issues asap. Thanks a lot for your consideration.

from topomodelx.

ninamiolane avatar ninamiolane commented on May 18, 2024

Thanks for the heads-up, and sorry to hear that you feel ill. Stay safe!

from topomodelx.

ninamiolane avatar ninamiolane commented on May 18, 2024

@devendragovil any update on this?

from topomodelx.

devendragovil avatar devendragovil commented on May 18, 2024

@ninamiolane Oh I am really sorry for the late response. I have raised a PR for this issue, run-times are now around 5.5-6 mins. I am stuck at one thing for a long time, it will help reduce overall run-time by 1-1.5 mins, but this PR helps reduce most of the time.

from topomodelx.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.