Comments (12)
@miquelduranfrigola ...my opinion on this would be to allow users to contribute their precalculation but not directly. It has to be properly moderated and verified before it can be added to the public file. I'm suggesting this because user contribution will eventually enhance the models. If its possible, a user public file can be created for users to contribute their precalculation. After the contribution has been verified, it can then be moved to the main public file
from ersilia.
Hello @miquelduranfrigola , I am interested in working on this issue. Can I please be assigned to it?
from ersilia.
@ashaby4000 your idea sounds good
from ersilia.
@busiratazeez we share the same idea. Allowing users to contribute their precalculation goes a long way in enhancing the models. @miquelduranfrigola my concern which is a potential risk is that these calculations may not be accurate and hence the need for verification before being added to the public file
from ersilia.
Hi @busiratazeez and @resiato, thanks for your excellent feedback. I did some thinking on all of this a while ago. You can see a scheme I draw back in the day: s3-hdf5-scheme.pdf (please don't worry if you don't understand it! Bottomline is... it's complicated :))
Our colleague @JHlozek has done a lot of thinking about it too. Our current solution involves DVC. Before we jump into coding: are you familiar with it?
To give space to everyone to contribute, I am assigning the issue to @resiato and @busiratazeez, but please @loweyvana and @Rufaida98 feel welcome to jump in!
from ersilia.
@miquelduranfrigola Honestly, I can't say I have encountered DVC before but I'm open to learning the innings of this of this system
from ersilia.
@miquelduranfrigola I'm a beginner in coding. So I checked out DVC and I understand why it's the current solution. It will provide a pointer to the large files but it definitely not solving the issue of verifying what a user uploads. I'm willing to dive into brainstorming alongside the team and see what we can come up with.
from ersilia.
@miquelduranfrigola I'll be happy to learn new things and apply it.
from ersilia.
Hi all.
@busiratazeez I agree with your comments. DVC is just the back-end storage facilitator and here we need to add an extra layer of functionality where users can opt to contribute data that is verified. I think the idea of staging them first (as you suggest) is prudent.
Some points to think about for implementation of data contribution:
- Where and how is the contributed data staged? Is this also done through DVC with a pointer to the staging H5 file or is there a better approach?
- How to efficiently evaluate our confidence in contributed data? I doubt we need/want to recalculate every data point.
- What is our tolerance for data variability? If there is some variability in a calculation for the same input, how do we intelligently decide that two values agree or disagree? Do we require 100% agreement when we validate this data?
from ersilia.
@busiratazeez I like your points made. @miquelduranfrigola The best situation would be to allow users to make their contributions to their local files, I believe the DVC would track any changes made to the calculations and would also allow users to go back on calculations done if any mistakes are made. This way users would be allowed to make changes to their precalculations after reviewing them before pushing them to the public files.
@miquelduranfrigola I would love to be part of contributing to this issue.
from ersilia.
Thanks all for the feedback. All is well noted.
@JHlozek thanks for these excellent points.
Gathering your points, we are consulting now with AWS and we are awaiting their feedback. Will keep you updated. I anticipate this will be a "slow" thread.
from ersilia.
Closing this issue for now. We might resolve to DynamoDb eventually.
from ersilia.
Related Issues (20)
- 🐕 Batch: New EOS Template
- 🐕 Batch: Make Ersilia CLI work with new Packing strategy
- 🐕 Batch: Design and document endpoints in the new app template within Ersilia Pack
- 🐈 Task: Use SPDX License identifiers in the Model Request issue template HOT 1
- 🐈 Task: Remove PyAirtable as a hard dependency HOT 1
- 🐈 Task: Fix action that uploads Ersilia base image to DockerHub
- 🐈 Task: Bump requests and docker-py versions in ersilia
- 🐕 Batch: Maintain additional credentials' requirements within models HOT 1
- 🐈 Task: Docker build for ersilia-pack HOT 2
- 🐛 Bug: log file not found warning after using the track flags
- 🐛 Bug: Fetching models on MacBook (M1) results in 404 error due to looking for linux/arm64 HOT 1
- 🐕 Batch: Define Model Installs through a YAML file instead of a Dockerfile HOT 3
- 🐕 Batch: Resource monitoring with different input scenarios and systems HOT 2
- 🐛 Bug: Numpy versions conflicts HOT 2
- 🐛 Bug: Tracking functionality does not work when a result CSV file is not specified
- 🐛 Bug: Performance Metrics Fail for String Output Model HOT 4
- 🐛 Bug: Ersilia close when model serving is interrupted HOT 2
- 🦠 Model Request: Cardiotoxicity Classifier HOT 13
- 🦠 Model Request: Demo Malaria Model HOT 8
- 🦠 Model Request: Predict bioactivity against Main Protease of SARS-CoV-2 HOT 22
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ersilia.