Comments (3)
One solution would be to move peak processing from Dataset
to Parser
in depthcharge. That way invalid spectra are encountered prior to being added to the index and can immediately be filtered out, rather than when being retrieved from the dataset. As an added advantage, spectrum processing only needs to be done once for each spectrum, when creating the index, rather than repeating it every time the same spectrum is retrieved.
A disadvantage, however, is that when spectrum processing options change, the index would have to be recreated. Nevertheless, I think our spectrum processing based on best practices and we don't really vary it, so for Casanovo at least it should be pretty fixed.
Additionally, it destroys the link between the indexes of the output PSMs and the input spectra. Although this is a broader issue (#70).
@wfondrie What do you think?
from casanovo.
I'm open to this. I also think that the current way of tracking the spectrum index is pretty fragile and not ideal. Instead, that information should be saved in the index itself. Ayse started a PR in depthcharge to solve the spectrum tracking issue, but it was incomplete - I'll see if I can get it updated and integrated.
from casanovo.
#105 will address spectrum index tracking once completed for future reference
from casanovo.
Related Issues (20)
- Using HDF5 file as train/val dataset leads to index out of bound error HOT 9
- No Models Saved, No Validation Loss Reported HOT 3
- Save final model HOT 1
- how to train casanovo v4 on huge dataset like massive-kb from scratch? HOT 3
- What is the criteria of saving the top k models in Casanovo version 4? HOT 1
- add additional inputs to encoder and decoder HOT 1
- Add contrastive loss term
- Implement bidirectional decoding HOT 1
- Add rotary embeddings
- mzTab validation
- Automate mzTab validation
- More information about the train/val/test split HOT 2
- WARNING: Skipped spectra with invalid precursor info HOT 1
- Export casanovo to torchscript/onnx HOT 1
- ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() HOT 3
- Make Casanovo produce Skyline compatible output
- 9-Species Benchmark Set: Data Preprocessing Step? HOT 5
- Migrating PeptideMass, PeptideDecoder, and PeptideEncoder from depthcharge v0.2.3 to casanovo HOT 3
- Is there a way to know which spectras are ITMS instead of FTMS? HOT 3
- Numpy release 2.0.0 breaks depthcharge dependency HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from casanovo.