Giter Club home page Giter Club logo

Comments (22)

miquelduranfrigola avatar miquelduranfrigola commented on July 19, 2024 2

Indeed. Getting size of git-lfs files is key here. Please see related issue #7 currently being tackled by @Riyabelle25

Using tqdm sounds good to me 👌

from ersilia.

miquelduranfrigola avatar miquelduranfrigola commented on July 19, 2024 2

Hi @Riyabelle25 and @prtk2001 I am very delayed with this - my apologies. Please give me some time, I am aware I am the bottleneck. I hope you are still in.

@Riyabelle25, my approach will be to add checkpoints in the fetching process (simple JSON files stored on disk). I have used workflow managers in the past, but in this case I want to keep it simple (Ersilia already has too many dependencies).

@mahamtariq58 thanks for volunteering. I think that at this point two point we do not assistance with this. Many thanks, though!

from ersilia.

prtk2001 avatar prtk2001 commented on July 19, 2024 1

Hi @miquelduranfrigola my basic approach is to use tqdm. The error I'm encountering is fetching size of files which are connected through git-lfs, it is actually showing the size of pointer. I'm trying to figure out a way for that.

from ersilia.

GemmaTuron avatar GemmaTuron commented on July 19, 2024 1

Hi all!

As an update, @nataliyah123 is working on this issue we will keep you posted on the progress!

from ersilia.

victorabba avatar victorabba commented on July 19, 2024

I would love to contribute to this issue.

from ersilia.

KundaiChasinda avatar KundaiChasinda commented on July 19, 2024

I would like to contribute to this issue @miquelduranfrigola

from ersilia.

prtk2001 avatar prtk2001 commented on July 19, 2024

I'm currently working on this issue @miquelduranfrigola

from ersilia.

GemmaTuron avatar GemmaTuron commented on July 19, 2024

Hello @KundaiChasinda, @prtk2001

Please before continuing the work in this issue, go to issue #36 as this is a required initial step to work on the project!
Thanks!

from ersilia.

miquelduranfrigola avatar miquelduranfrigola commented on July 19, 2024

Hi @prtk2001 thanks for your interest in this issue. Can you briefly explain to me your approach to the problem? I just want to make sure that your solution corresponds to what we need :)

from ersilia.

miquelduranfrigola avatar miquelduranfrigola commented on July 19, 2024

One possible workaround would be to calculate the size of the model and store this information as metadata in each model repository, correspondingly. For example, for model eos4e40 (https://github.com/ersilia-os/eos4e40) we could add a metadata.json file containing model size information. What are your thoughts?

from ersilia.

prtk2001 avatar prtk2001 commented on July 19, 2024

This is a unique approach, I would love to implement it

from ersilia.

Rufaida98 avatar Rufaida98 commented on July 19, 2024

@miquelduranfrigola your idea about calculating the size which shows how much progress is an interesting approach

from ersilia.

miquelduranfrigola avatar miquelduranfrigola commented on July 19, 2024

Alright, so let's take this avenue, if you both agree @prtk2001 and @Rufaida98! It will not be straightforward, though.

Many things happen at "fetching" time (ersilia fetch ...), including downloading, creating folders, creating environments, deleting folders, doing tests, etc.

I suggest the following solution:

  1. Create a metadata.json file where each step is stored along with the time taken on a standard computer with average internet bandwidth (e.g. Ersilia's workstation).
  2. The metadata file can only contain total disk usage.
  3. This metadata file can be stored in every model repository.

Then, at fetching time:

  1. The first thing we can do is check the metadata.json file.
  2. Since this file contains steps and estimated timepoints, then we have a way of building a progress bar.

So I suggest the following. @miquelduranfrigola (me) works on points 1,2,3. As soon as I am done, I will notify you. Then @prtk2001 and @Rufaida98 can suggest an approach for 4 and 5. What do you think?

from ersilia.

Rufaida98 avatar Rufaida98 commented on July 19, 2024

@miquelduranfrigola Amazing!. I really want to learn and happy to work with a colleague @prtk2001 :)

from ersilia.

prtk2001 avatar prtk2001 commented on July 19, 2024

@miquelduranfrigola sounds goood, I'm in!
same from this side @Rufaida98 :)

from ersilia.

Riyabelle25 avatar Riyabelle25 commented on July 19, 2024

@miquelduranfrigola I'd love to work with/help you on pts 1,2,3. If you have an approach in mind (asides this), do share!

from ersilia.

mahamtariq58 avatar mahamtariq58 commented on July 19, 2024

Hi ! hope you are well.
I have completed the installation steps and I am really interested in this issue. Can I start working on this issue as well?

from ersilia.

Riyabelle25 avatar Riyabelle25 commented on July 19, 2024

@miquelduranfrigola apologies for sounding noob-ish 🙈, your approach is to add pythonic checkpoints during cmd ersilia fetch correct?
This ultimately translates to adding checkpoints during git fetch itself, as that's how we're fetching the model from its repository?

from ersilia.

Riyabelle25 avatar Riyabelle25 commented on July 19, 2024

Screenshot 2022-04-08 at 5 06 01 PM

Aight, so this is how I'm calculating the time taken for each step (defined in `fetch.py`) and showing the same in the CLI.

Opening a PR for the same now, @miquelduranfrigola do take a look 😄

from ersilia.

Rufaida98 avatar Rufaida98 commented on July 19, 2024

After @miquelduranfrigola and @Riyabelle25 suggestions for solving the Metadata.json file as I think your both approaches are amazing. So what do you think if we check the Metadata file by defining a test function to perform the encoding and output of the size by using the "assert" e.g (assert len(encode...)==f.tell
Sorry for talking alot but let me know your opinions because you are experienced than me :)

from ersilia.

miquelduranfrigola avatar miquelduranfrigola commented on July 19, 2024

Hi @Rufaida98 thanks for giving it some thought. Have you checked the open PR thread: #188 ?
I would like to understand what do you mean by encoding in this context. I am sure your suggestion makes sense but I don't fully understand it at the moment. Can you elaborate a bit more? Thanks!

from ersilia.

GemmaTuron avatar GemmaTuron commented on July 19, 2024

Hi all,

We have decided to temporarily leave this feature as is, see #528

from ersilia.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.