Comments (4)
Hi Guhan,
I haven't encountered an early stopping problem during training before and it's actually weird that model calls `` 11929th at iteration despite correctly recognizing there would be a total of 19028 iterations. Did you try training the model with a smaller data set, maybe a subsample of the one you're using, and if so did you observe early stopping behavior?
from casanovo.
Hi @melihyilmaz,
I found the issue after some digging in the source code for pytorch lightning.
In the pytorch_lightning/loops/base.py"
file, I noticed that the run()
function excepts a StopIteration
which effectively tells the training loop to end:
def run(self, *args: Any, **kwargs: Any) -> T:
"""The main entry point to the loop.
Will frequently check the :attr:`done` condition and calls :attr:`advance`
until :attr:`done` evaluates to ``True``.
Override this if you wish to change the default behavior. The default implementation is:
Example::
def run(self, *args, **kwargs):
if self.skip:
return self.on_skip()
self.reset()
self.on_run_start(*args, **kwargs)
while not self.done:
self.advance(*args, **kwargs)
output = self.on_run_end()
return output
Returns:
The output of :attr:`on_run_end` (often outputs collected from each step of the loop)
"""
if self.skip:
return self.on_skip()
self.reset()
self.on_run_start(*args, **kwargs)
while not self.done:
try:
self.on_advance_start(*args, **kwargs)
self.advance(*args, **kwargs)
self.on_advance_end()
self.restarting = False
except StopIteration:
break
output = self.on_run_end()
return output
I removed the try
/except
and got a StopIteration
on a particular spectrum in the .mgf
that looked like this:
BEGIN IONS
TITLE=782.5711
PEPMASS=1088.5186
CHARGE=3+
SCANS=5711
RTINSECONDS=134.84
SEQ=QASTSEESDEMPVPDSESVFVIPGSALLWR
2731.2097 7026.7
END IONS
Since I had turned spectrum preprocessing on in the config.py
file, and the default for max_mz
in the preprocess_peaks
function is 2500, the input into the _get_filter_intensity_mask
function was an empty intensity array. This then threw an IndexError - min_intensity *= intensity[intensity_idx[-1]]: IndexError: index -1 is out of bounds for axis 0 with size 0
.
TLDR, I think that it would be nice to include some error catching for empty spectra post-processing if possible, especially since the pytorch-lightning module has such bad error-messaging! I will remove this spectrum from the file for now.
from casanovo.
I see, thanks for raising this issue and letting us know. I'll level the issue open to add error handling in the future.
from casanovo.
Fixed in #55.
from casanovo.
Related Issues (20)
- Exclude some PTMs for prediction? HOT 2
- Question about tracking spectra from MGF files in output HOT 1
- Using HDF5 file as train/val dataset leads to index out of bound error HOT 9
- No Models Saved, No Validation Loss Reported HOT 3
- Save final model HOT 1
- how to train casanovo v4 on huge dataset like massive-kb from scratch? HOT 3
- What is the criteria of saving the top k models in Casanovo version 4? HOT 1
- add additional inputs to encoder and decoder HOT 1
- Add contrastive loss term
- Implement bidirectional decoding HOT 1
- Add rotary embeddings
- mzTab validation
- Automate mzTab validation HOT 7
- More information about the train/val/test split HOT 2
- WARNING: Skipped spectra with invalid precursor info HOT 1
- Export casanovo to torchscript/onnx HOT 1
- ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() HOT 3
- Make Casanovo produce Skyline compatible output
- 9-Species Benchmark Set: Data Preprocessing Step? HOT 5
- Migrating PeptideMass, PeptideDecoder, and PeptideEncoder from depthcharge v0.2.3 to casanovo HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from casanovo.