Comments (22)
Indeed. Getting size of git-lfs files is key here. Please see related issue #7 currently being tackled by @Riyabelle25
Using tqdm sounds good to me 👌
from ersilia.
Hi @Riyabelle25 and @prtk2001 I am very delayed with this - my apologies. Please give me some time, I am aware I am the bottleneck. I hope you are still in.
@Riyabelle25, my approach will be to add checkpoints in the fetching process (simple JSON files stored on disk). I have used workflow managers in the past, but in this case I want to keep it simple (Ersilia already has too many dependencies).
@mahamtariq58 thanks for volunteering. I think that at this point two point we do not assistance with this. Many thanks, though!
from ersilia.
Hi @miquelduranfrigola my basic approach is to use tqdm. The error I'm encountering is fetching size of files which are connected through git-lfs, it is actually showing the size of pointer. I'm trying to figure out a way for that.
from ersilia.
Hi all!
As an update, @nataliyah123 is working on this issue we will keep you posted on the progress!
from ersilia.
I would love to contribute to this issue.
from ersilia.
I would like to contribute to this issue @miquelduranfrigola
from ersilia.
I'm currently working on this issue @miquelduranfrigola
from ersilia.
Hello @KundaiChasinda, @prtk2001
Please before continuing the work in this issue, go to issue #36 as this is a required initial step to work on the project!
Thanks!
from ersilia.
Hi @prtk2001 thanks for your interest in this issue. Can you briefly explain to me your approach to the problem? I just want to make sure that your solution corresponds to what we need :)
from ersilia.
One possible workaround would be to calculate the size of the model and store this information as metadata in each model repository, correspondingly. For example, for model eos4e40 (https://github.com/ersilia-os/eos4e40) we could add a metadata.json file containing model size information. What are your thoughts?
from ersilia.
This is a unique approach, I would love to implement it
from ersilia.
@miquelduranfrigola your idea about calculating the size which shows how much progress is an interesting approach
from ersilia.
Alright, so let's take this avenue, if you both agree @prtk2001 and @Rufaida98! It will not be straightforward, though.
Many things happen at "fetching" time (ersilia fetch ...
), including downloading, creating folders, creating environments, deleting folders, doing tests, etc.
I suggest the following solution:
- Create a
metadata.json
file where each step is stored along with the time taken on a standard computer with average internet bandwidth (e.g. Ersilia's workstation). - The metadata file can only contain total disk usage.
- This metadata file can be stored in every model repository.
Then, at fetching time:
- The first thing we can do is check the
metadata.json
file. - Since this file contains steps and estimated timepoints, then we have a way of building a progress bar.
So I suggest the following. @miquelduranfrigola (me) works on points 1,2,3. As soon as I am done, I will notify you. Then @prtk2001 and @Rufaida98 can suggest an approach for 4 and 5. What do you think?
from ersilia.
@miquelduranfrigola Amazing!. I really want to learn and happy to work with a colleague @prtk2001 :)
from ersilia.
@miquelduranfrigola sounds goood, I'm in!
same from this side @Rufaida98 :)
from ersilia.
@miquelduranfrigola I'd love to work with/help you on pts 1,2,3. If you have an approach in mind (asides this), do share!
from ersilia.
Hi ! hope you are well.
I have completed the installation steps and I am really interested in this issue. Can I start working on this issue as well?
from ersilia.
@miquelduranfrigola apologies for sounding noob-ish 🙈, your approach is to add pythonic checkpoints during cmd ersilia fetch
correct?
This ultimately translates to adding checkpoints during git fetch
itself, as that's how we're fetching the model from its repository?
from ersilia.
Opening a PR for the same now, @miquelduranfrigola do take a look 😄
from ersilia.
After @miquelduranfrigola and @Riyabelle25 suggestions for solving the Metadata.json file as I think your both approaches are amazing. So what do you think if we check the Metadata file by defining a test function to perform the encoding and output of the size by using the "assert" e.g (assert len(encode...)==f.tell
Sorry for talking alot but let me know your opinions because you are experienced than me :)
from ersilia.
Hi @Rufaida98 thanks for giving it some thought. Have you checked the open PR thread: #188 ?
I would like to understand what do you mean by encoding in this context. I am sure your suggestion makes sense but I don't fully understand it at the moment. Can you elaborate a bit more? Thanks!
from ersilia.
Hi all,
We have decided to temporarily leave this feature as is, see #528
from ersilia.
Related Issues (20)
- 🐅 Epic: Remove BentoML as a dependency in Ersilia and move to FastAPI
- 🐕 Batch: Ersilia Pack Development
- 🐕 Batch: New EOS Template
- 🐕 Batch: Make Ersilia CLI work with new Packing strategy
- 🐕 Batch: Design and document endpoints in the new app template within Ersilia Pack
- 🐈 Task: Use SPDX License identifiers in the Model Request issue template HOT 1
- 🐈 Task: Remove PyAirtable as a hard dependency HOT 1
- 🐈 Task: Fix action that uploads Ersilia base image to DockerHub
- 🐈 Task: Bump requests and docker-py versions in ersilia
- 🐕 Batch: Maintain additional credentials' requirements within models HOT 1
- 🐈 Task: Docker build for ersilia-pack HOT 2
- 🐛 Bug: log file not found warning after using the track flags
- 🐛 Bug: Fetching models on MacBook (M1) results in 404 error due to looking for linux/arm64 HOT 1
- 🐕 Batch: Define Model Installs through a YAML file instead of a Dockerfile HOT 3
- 🐕 Batch: Resource monitoring with different input scenarios and systems HOT 2
- 🐛 Bug: Numpy versions conflicts HOT 2
- 🐛 Bug: Tracking functionality does not work when a result CSV file is not specified
- 🐛 Bug: Performance Metrics Fail for String Output Model HOT 4
- 🐛 Bug: Ersilia close when model serving is interrupted HOT 2
- 🦠 Model Request: Cardiotoxicity Classifier HOT 13
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ersilia.