Comments (10)
Excellent, thanks for the very detailed diagnosis. I agree with all your points and the solutions.
iv. I like the naive solution of reducing the number of epochs from 5 to 1, together with a comment in the text explaining that in real applications that number should be increased.
@devendragovil could you do this?
i-iii. These are awesome solutions, but would take more time. Maybe we can deprioritize them for now? (there are a lot of other tasks remaining).
from topomodelx.
@ninamiolane
yes I can do this. I can also implement the 3rd solution as well, I was independently working on the same for some time, and should hopefully be able to do it by Sunday. Will that work if I implement the 3rd solution by Sunday?
from topomodelx.
Independently of this issue, I also wanted to know if Sunday is a reasonable target to resolve all (or most in case of getting totally stuck in an issue) issues assigned to me?
from topomodelx.
Even better if you can do iii as well, thanks for offering!
Sunday is a perfect target of deadline 💯 Thanks for your great and fast work.
from topomodelx.
Analysis
I have analyzed the runtime for all unit tests. Hypergraph Tutorials indeed do take the longest durations. Please find the times of the longest 5 tests here:
Category | Name | Run Time (sec) |
---|---|---|
Hypergraph | DHGCN. | 208 |
Hypergraph | Hypersage | 176 |
Hypergraph | UniGCNII. | 81 |
Hypergraph | UniGCN | 42 |
Simplicial | Scone | 27 |
My observations:
- Individual test times are not that outrageous.
- It takes really long because all the tests are running sequentially
Deep Dive (DHGCN Tutorial)
All steps are taking reasonable amount of time (< 5 secs) except the last step which is a 5 epoch training run for the DHGCN Hypergraph TNN.
Observations
- Individual train times do seem reasonable to me (please correct me if I am wrong). These might speed up with GPU access
- Environment is built repeatedly. For tutorials the libraries are imported repeatedly.
- All tests (until recently) were being run sequentially.
Recommendations/Solutions
- We can arrange for GPU for the test-suite. I don't think Github actions provides a runner with GPU, we will need to arrange our own runner, which can be configured. However, configuration might be time consuming and hosting a GPU instance might be costly.
- We can do aggressive caching for our environment as well as libraries being imported.
- We can run tests in parallel. There are many libraries like pytest-xdist and pytest-split that enable this. Since Github Actions runners are single core, we can use the matrix strategy for parallelization. The tests can also be split based on the time they take to enable 5-7 (or as required) equally timed partitions. Since the longest test takes just over 3 minutes, that is the shortest time parallelization can achieve without making changes in tutorials themselves.
- A Naive Solution: Reducing number of epochs in tutorials. Reducing number of epochs in DHGCN from 5 to 1 reduces the time by a fifth, and DHGCN tutorial concludes within a minute.
from topomodelx.
Thanks a lot!
from topomodelx.
@ninamiolane I fell ill after my travel back from India last week, so couldn't meet the timeline that I gave earlier. Sorry for that! I will try to complete all the issues asap. Thanks a lot for your consideration.
from topomodelx.
Thanks for the heads-up, and sorry to hear that you feel ill. Stay safe!
from topomodelx.
@devendragovil any update on this?
from topomodelx.
@ninamiolane Oh I am really sorry for the late response. I have raised a PR for this issue, run-times are now around 5.5-6 mins. I am stuck at one thing for a long time, it will help reduce overall run-time by 1-1.5 mins, but this PR helps reduce most of the time.
from topomodelx.
Related Issues (20)
- Check docstrings everywhere HOT 5
- Make unique doc website for the three packages HOT 8
- RED LIST: Check before submitting
- Write coverletter
- `Dist2Cycle` never calls `Dist2CycleLayer`s HOT 5
- Extends TopoEmbedX to ColoredHypergraphs and Path complexes
- Possible bug in scatter_sum
- Best Practices for batching?
- Add path complex neural network in TopoModelX
- can_train tutorial fails with new sparse casting
- Review nn/hypergraph models HOT 1
- Review nn/simplicial models
- Review nn/cell and nn/combinatorial models HOT 1
- Fix bug in CAN tutorial
- Create Dataloaders HOT 1
- Add notebooks from TDL paper's experiments HOT 1
- Review and rewrite models that are not using topomodelx.base.conv primitives
- `SCConv` needs a full revision HOT 2
- Add `numpydoc` validation to CI
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from topomodelx.