Comments (11)
Thanks for the explanations, much celarer now, and good job on doing a PCA as well!
Please move onto preparing your final application, many thanks!
from ersilia.
Hi @IshitaPathak
Please update here w2 tasks that you have marked as done, so we can provide feedback
from ersilia.
MY PROGRESS AND LEARNINGS
So far, I've learned valuable skills to contribute to Ersilia. It's been an exciting journey
- Learned Docker by Dockerized a simple app GitHub repo here, learned about
- Dockerfile
- Caching layers
- Publishing to Docker Hub.
- Explored Docker Compose, understanding port mapping and managing environment variables.
I have a strong foundation in Python, but my exposure to libraries was somewhat limited. To address this, I've invested some time in learning some libraries GitHub repo here like Pandas and NumPy. By today, I aim to complete my understanding of Matplotlib and other libraries essential for my current task. Following this, I move forward with the next part of Week 2 tasks.
from ersilia.
Thanks for the explanation. I suggest the following timeline:
- Finish week 2 tasks, including a good explanation of what you have done and your conclusions
- Start working on your final application
As the application period is coming to an end and we want to ensure applicants have time to prepare strong applications please do not tackle Week 3 tasks and focus on the final application instead. Thanks!
from ersilia.
Motivation Letter
Hi, I am Ishita Pathak currently a first year student pursuing Master of Computer Application from Indira Gandhi Delhi Technical University For Women,Delhi, India. I am writing to express my genuine excitement about the opportunity to contribute to Ersilia's goals, to ensure that laboratories in less affluent countries have access to cutting-edge AI and ML tools for discovering drugs to treat infectious and neglected diseases.
As a computer science student, I have worked across various tech stacks. However, my current aspiration lies in delving deeper into AI/ML as ML is in my coursework too and Ersilia's project provides a chance to leverage my skills and knowledge to address real-world challenges. Being a quick learner, I'm ready to dedicate the time and effort needed to achieve these goals and learn new things along way.
Six years back, I went through a tough time when someone very close to me passed away because they couldn't get the medical help they needed in time. It really affected me and sparked a strong desire to make a difference in healthcare. I believe that contributing to Ersilia with my technical skills is the best way for me to do that. I am confident that I can contribute positively to advancing healthcare solutions and ultimately saving lives.
Why me?
My passion for open source and never give up attitude sets me apart from others. I’ve always felt that working in open source and helping is my way of doing good for society but through this project, I’ll not only be able to give back to the community but also potentially save lives. I am excited about the opportunity to work on this project and will work as hard as I have to make this project a grand success.
Thanks and Regards
Ishita Pathak
from ersilia.
Week 1 TASK ✅
After Installation of Ersilia Model Hub I test it for simple model
ersilia -v fetch eos3b5e
ersilia serve eos3b5e
ersilia -v api run -i "CCCC"
Output
Testing Ersilia with Docker
docker pull ersiliaos/eos4wt0:latest
ersilia serve eos4wt0
ersilia -v api run -i "CCCC"
Output
While completing the task I stuck at a point when I was testing ersillia model eos3b5e
, where the container is always in exited status. I asked about this in Slack channel, where mentor helped me resolve the issue.
I truly appreciate the supportive environment within community, where both mentors and peers are always ready to lend a helping hand.
from ersilia.
Week 2 TASK ✅
- Chose the hERG model "eos30gr" from the list of suggested models in GitBook
- Read the publication to better understand the model.
Model Overview
As hERG channel is responsible for regulating the electrical signals in the heart. When certain drugs block this channel, it can cause a condition known as long QT syndrome, which can lead to dangerous heart rhythm abnormalities.
To identify which drugs might have this effect, Ersilia developed a computer-based model called deephERG. This model uses a type of artificial intelligence called deep neural networks to analyze large datasets containing information on thousands of chemicals. By studying the chemical structures and properties of these compounds, deephERG can predict their likelihood of blocking the hERG channel.
- Ensured model functionality on my system by downloading, serving, and running it using the following commands:
ersilia -v fetch eos30gr
ersilia serve eos30gr
ersilia -v api run -i "CCCC"
Upon fetching the eos30gr model, I encountered consistent null output for the smiles prediction. Since the models are regularly updated, I tried the command ersilia -v fetch eos30gr --from_github
to fetch the latest code from GitHub, which resolved the issue seamlessly.
Output
- Next I understood the repository structure from the provided example and created the GitHub Repository that has all necessary files.
from ersilia.
Thankyou so much @GemmaTuron for the guidance and timeline. I'm committed to finishing the week 2 tasks and starting work on my final application right away.
from ersilia.
-
Selected list of 1000 molecules
reference_library.csv
shared in Slack (data channel). To make sure the data was consistent, I standardized this SMILES representations using the function from src. For three SMILES, RDKit encounters invalid SMILES, resulting in NaN values. I removed those invalid entries from the dataset. -
Next, I obtained the InChIKey representation for all the standardized SMILES. This information was used to create a DataFrame containing the processed SMILES and their corresponding InChIKeys. Now, this DataFrame had two columns: "smiles" and "InChI_key" I then saved this processed data as a csv file named
processed_input.csv.
After cleaning the data and obtaining corresponding InChIKey, I ran the model on the processed dataset using following commands
ersilia -v fetch eos30gr --from_github
ersilia serve eos30gr
ersilia -v api run -i processed_input.csv -o output.csv
The output generated by the model is saved in the file output.csv
- I use the predictions I got from the Ersilia Model Hub and create the necessary plots to see how are they distributed...
From the scatter plot we can say that due to significant overlap between the two classes, distinguishing between them becomes challenging. This overlap suggests that the features used for classification may not be distinct enough, impacting the model's ability to make accurate predictions and without a clear separation between the classes, the model may struggle to effectively differentiate between hERG blockers and non-blockers.
Completed week2 Task1 here is the link of notebook for this task 00_model_bias.ipynb
WEEK2 TASK2
-
Selected Table6 from this repo provided in the publication on page no. 32 where author have taken 1,824 FDA approved small molecule drugs from DrugBank database. After standardising the smilies, removing null and duplicates values.
-
I ran the model on the dataset using following commands
ersilia -v fetch eos30gr
ersilia serve eos30gr
ersilia -v api run -i input_week2_task2.csv -o output_week2_task2.csv
- Then I compared the results of publication with those generated by the eos30gr model. The objective was to determine if both sources produce similar results.
From the above graphs, it's very clear that there's a difference between the results obtained from the publication and those from the Ersilia Model Hub. This inconsistency suggests that the eos30gr model may not be reproducible.
Percentage of hERG Blockers and Non-Blockers in Publication Result:
Blockers | Number | Percentage |
---|---|---|
Yes (Herg Blockers) | 513 | 29.79% |
No (Non-Blockers) | 1209 | 70.21% |
Percentage of hERG Blockers and Non-Blockers After Testing from the Model:
Blockers | Number | Percentage |
---|---|---|
Yes (Herg Blockers) | 411 | 23.87% |
No (Non-Blockers) | 1311 | 76.13% |
From these percentages also, it's evident that there is a discrepancy between the percentage of hERG blockers and non-blockers in the publication results compared to those obtained from testing the model. This suggests potential issues with the reproducibility of the model. Hence model eos30gr
is not reproducible.
Here is the link for GitHub repository
WEEK3 TASK
Selected a suitable dataset with sufficient experimental results, named external_dataset_Xaio_Li.csv in data folder.
Here is the reference of the data , I have taken Li 1092 test data
from ersilia.
Thankyou soo much @GemmaTuron. I really appreciate your time and feedback. Started working on final application.
from ersilia.
WEEK 4 TASK ✅
- Created final application and received feedback from mentor.
- Submitted the final application on the Outreachy website.
from ersilia.
Related Issues (20)
- 🐛 Bug: log file not found warning after using the track flags
- 🐛 Bug: Fetching models on MacBook (M1) results in 404 error due to looking for linux/arm64 HOT 1
- 🐕 Batch: Define Model Installs through a YAML file instead of a Dockerfile HOT 3
- 🐕 Batch: Resource monitoring with different input scenarios and systems HOT 2
- 🐛 Bug: Numpy versions conflicts HOT 2
- 🐛 Bug: Tracking functionality does not work when a result CSV file is not specified
- 🐛 Bug: Performance Metrics Fail for String Output Model HOT 4
- 🐛 Bug: Ersilia close when model serving is interrupted HOT 2
- 🦠 Model Request: Cardiotoxicity Classifier HOT 13
- 🦠 Model Request: Demo Malaria Model HOT 8
- 🦠 Model Request: Predict bioactivity against Main Protease of SARS-CoV-2 HOT 25
- 🦠 Model Request: Unit Test Model Compound HOT 3
- 🦠 Model Request: QupKake: predict micro-pKa of organic molecules HOT 31
- 🐈 Task: Remove dead code from ersilia
- 🐈 Task: Inconsistency between current licenses recognized by Ersilia vs those maintained in Airtable HOT 5
- 🐛 Bug: Ersilia fetch breaking especially when inside docker containers
- 🐈 Task: Reflect correct values in information.json HOT 5
- 🐈 Task: Ersilia tries to close docker containers when any model is fetched, even if from source
- 🦠 Model Request: Unit Test Model Compound HOT 3
- 🐛 Bug: Ersilia Test Command: False Positive Test Failure HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ersilia.