What? Testing the tutorials on hypergraphs takes ~15 minutes, wher

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Diagnose & Speed-up Hypergraph tutorials about topomodelx HOT 10 CLOSED

pyt-team commented on May 18, 2024

Diagnose & Speed-up Hypergraph tutorials

from topomodelx.

Comments (10)

ninamiolane commented on May 18, 2024 1

Excellent, thanks for the very detailed diagnosis. I agree with all your points and the solutions.

iv. I like the naive solution of reducing the number of epochs from 5 to 1, together with a comment in the text explaining that in real applications that number should be increased.
@devendragovil could you do this?

i-iii. These are awesome solutions, but would take more time. Maybe we can deprioritize them for now? (there are a lot of other tasks remaining).

from topomodelx.

devendragovil commented on May 18, 2024 1

@ninamiolane
yes I can do this. I can also implement the 3rd solution as well, I was independently working on the same for some time, and should hopefully be able to do it by Sunday. Will that work if I implement the 3rd solution by Sunday?

from topomodelx.

devendragovil commented on May 18, 2024 1

Independently of this issue, I also wanted to know if Sunday is a reasonable target to resolve all (or most in case of getting totally stuck in an issue) issues assigned to me?

from topomodelx.

ninamiolane commented on May 18, 2024 1

Even better if you can do iii as well, thanks for offering!

Sunday is a perfect target of deadline 💯 Thanks for your great and fast work.

from topomodelx.

devendragovil commented on May 18, 2024

@ninamiolane

Analysis

I have analyzed the runtime for all unit tests. Hypergraph Tutorials indeed do take the longest durations. Please find the times of the longest 5 tests here:

Category	Name	Run Time (sec)
Hypergraph	DHGCN.	208
Hypergraph	Hypersage	176
Hypergraph	UniGCNII.	81
Hypergraph	UniGCN	42
Simplicial	Scone	27

My observations:

Individual test times are not that outrageous.
It takes really long because all the tests are running sequentially

Deep Dive (DHGCN Tutorial)

All steps are taking reasonable amount of time (< 5 secs) except the last step which is a 5 epoch training run for the DHGCN Hypergraph TNN.

Observations

Individual train times do seem reasonable to me (please correct me if I am wrong). These might speed up with GPU access
Environment is built repeatedly. For tutorials the libraries are imported repeatedly.
All tests (until recently) were being run sequentially.

Recommendations/Solutions

We can arrange for GPU for the test-suite. I don't think Github actions provides a runner with GPU, we will need to arrange our own runner, which can be configured. However, configuration might be time consuming and hosting a GPU instance might be costly.
We can do aggressive caching for our environment as well as libraries being imported.
We can run tests in parallel. There are many libraries like pytest-xdist and pytest-split that enable this. Since Github Actions runners are single core, we can use the matrix strategy for parallelization. The tests can also be split based on the time they take to enable 5-7 (or as required) equally timed partitions. Since the longest test takes just over 3 minutes, that is the shortest time parallelization can achieve without making changes in tutorials themselves.
A Naive Solution: Reducing number of epochs in tutorials. Reducing number of epochs in DHGCN from 5 to 1 reduces the time by a fifth, and DHGCN tutorial concludes within a minute.

from topomodelx.

devendragovil commented on May 18, 2024

Thanks a lot!

from topomodelx.

devendragovil commented on May 18, 2024

@ninamiolane I fell ill after my travel back from India last week, so couldn't meet the timeline that I gave earlier. Sorry for that! I will try to complete all the issues asap. Thanks a lot for your consideration.

from topomodelx.

ninamiolane commented on May 18, 2024

Thanks for the heads-up, and sorry to hear that you feel ill. Stay safe!

from topomodelx.

ninamiolane commented on May 18, 2024

@devendragovil any update on this?

from topomodelx.

devendragovil commented on May 18, 2024

@ninamiolane Oh I am really sorry for the late response. I have raised a PR for this issue, run-times are now around 5.5-6 mins. I am stuck at one thing for a long time, it will help reduce overall run-time by 1-1.5 mins, but this PR helps reduce most of the time.

from topomodelx.

Diagnose & Speed-up Hypergraph tutorials about topomodelx HOT 10 CLOSED

Comments (10)

Analysis

Deep Dive (DHGCN Tutorial)

Observations

Recommendations/Solutions

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent