noble-lab / casanovo Goto Github PK
View Code? Open in Web Editor NEWDe Novo Mass Spectrometry Peptide Sequencing with a Transformer Model
Home Page: https://casanovo.readthedocs.io
License: Apache License 2.0
De Novo Mass Spectrometry Peptide Sequencing with a Transformer Model
Home Page: https://casanovo.readthedocs.io
License: Apache License 2.0
Hi,
This question may be a little silly. In the peptides of the output file, there are peptides such as "C+57.021GHTNNLRPK". I don't quite understand what "+57.021" means. The peptide predicted by the algorithm is "CGHTNNLRPK" or there are some unknown amino acid between C and GHTNNLRPK?
0 | LAHYNKR | 0.991221964
1 | VKEDPDGEAHR | 0.965903959
2 | C+57.021GHTNNLRPK | 0.991358876
3 | VVQEQGTHPK | 0.987940228
4 | KGKPELR | 0.991316216
5 | SLSHSPGK | 0.993293308
Thanks,
LeeLee
I think we should add a command line parameter such as --output-root that specifies the root of the output files produced by Casanovo. Then we should have two files, .casanovo.txt and .casanovo.log.txt, where the former contains the PSMs and the latter contains all of the messages sent to stderr.
Hi~
Thanks for your wonderful work on de novo sequencing. When I trained a model using multiple mgf files, errors happened.
My command:
casanovo --model=train --train_data_path=/my-mgf-folder ......
The error:
ValueError(f"Only MGF files are supported.")
Is there something wrong with me?
A .py file is not a proper config file format - Choose a standard such as .ini (https://docs.python.org/3/library/configparser.html), TOML, or YAML.
Hi~ I got the following error. My mgf file was converted from the .raw file using msconvert.
File "/home/songjian/anaconda2/envs/casanovo/lib/python3.7/site-packages/depthcharge/data/parsers.py", line 157, in parse_spectrum
self.annotations.append(spectrum["params"]["seq"])
Is there something wrong with me?
Hi,
I read the code but can not understand the following:
self.latent_spectrum = torch.nn.Parameter(torch.randn(1, 1, dim_model))
# Add the spectrum representation to each input:
latent_spectra = self.latent_spectrum.expand(peaks.shape[0], -1, -1)
peaks = torch.cat([latent_spectra, peaks], dim=1)
why we should add a random vector to the peaks encoded vector?
Thanks for your kind answer.
Invalid spectra after preprocessing are replaced by a dummy spectrum, but Casanovo still predicts a peptide for them. The resulting predictions are naturally incorrect, consisting of long peptide sequences with low(ish) scores (but not obviously wrong).
Instead invalid spectra should be filtered out or no prediction should be given. The former is probably better, because it might be a factor during training as well. I haven't fully been able to figure out how to skip items in the dataloader though.
Hello!
I am interested in using Casanovo for inference, and I just wanted to make sure that I am using it correctly.
I am using PyTorch version 1.10.2 with Cuda 11.3 on a machine with a GPU and 8 workers, and adjusted the config.py file accordingly:
#Hardware options
num_workers = 8
gpus = [0] #None for CPU, int list to specify GPUs
When I run inference on the attached file (renamed to test.mgf.txt so it can attach here), I get the folllowing result:
spectrum_id,denovo_seq
0,LLAETLLR
However, I know that the inference from MSFragger yields QLEQVIAK, which is the true peptide. I am using the pretrained human weights. Would you mind terribly testing the algorithm on the attached file to see if I am simply using Casanovo wrong? Thank you so much, and I really appreciate the help.
Hi~
I trained Casanovo using my our mgf file (https://figshare.com/articles/dataset/Casanovo-Train-MGF/19204794, size 85M). But the log reported:
attn = torch.bmm(q, k.transpose(-2, -1))
RuntimeError: CUDA out of memory. Tried to allocate 2.62 GiB (GPU 1; 10.76 GiB total capacity; 7.72 GiB already allocated; 1.84 GiB free; 7.74 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
I used the default config.py (batch size 32) and 2080Ti.
Is there something wrong with my mgf file or else?
Use mkdocs (https://www.mkdocs.org/). Alternatively sphinx (https://www.sphinx-doc.org/en/master/)
On README.md and on the website.
Consider adding preprocess_spec
, test_batch_size
, num_workers
, gpus
Right now, all of them say TEXT - For example, the --mode option looks like it takes two choices, but I don’t know what those are purely from looking at the help (in general, if an option has two choices, it would be better served by a flag instead). Try to be more descriptive with those options and improve documentation for --help.
See if we can assign the maximum num_workers
, have to check PyTorch Lightning Documentation
I'd like to run Casanovo on a machine with no GPU, just in "de novo" mode (i.e., without training the model). The sample config.yaml file indicates that this is possible:
gpus: [0] #None for CPU, int list to specify GPUs
However, when I changed the above line to
gpus: None #None for CPU, int list to specify GPUs
I still get an error message:
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/pytorch_lightning/utilities/device_parser.py", line 131, in _normalize_parse_gpu_string_input
return int(s.strip())
ValueError: invalid literal for int() with base 10: 'None'
The full output is listed below.
casanovo --mode=denovo --model_path=../../../data/22-07-02_weights/pretrained_excl_mouse.ckpt --test_data_path=mgf --config_path=config.yaml
/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/torch/cuda/__init__.py:83: UserWarning: CUDA initialization: The NVIDIA driver on your system is too old (found version 9010). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:109.)
return torch._C._cuda_getDeviceCount() > 0
INFO: De novo sequencing with Casanovo...
INFO: Created a temporary directory at /tmp/tmp5l3bzhid
INFO: Writing /tmp/tmp5l3bzhid/_remote_module_non_scriptable.py
INFO: Reading 1 files...
mgf/2022_01_13_HAB_timeseries_DDA_54_17-17.mgf: 53109spectra [00:47, 1122.26spectra/s]
Traceback (most recent call last):
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/bin/casanovo", line 8, in <module>
sys.exit(main())
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/casanovo/casanovo.py", line 83, in main
denovo(test_data_path, model_path, config, output_path)
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/casanovo/denovo/train_test.py", line 242, in denovo
num_sanity_val_steps=config['num_sanity_val_steps']
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/pytorch_lightning/trainer/connectors/env_vars_connector.py", line 38, in insert_env_defaults
return fn(self, **kwargs)
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 426, in __init__
gpu_ids, tpu_cores = self._parse_devices(gpus, auto_select_gpus, tpu_cores)
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1543, in _parse_devices
gpu_ids = device_parser.parse_gpu_ids(gpus)
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/pytorch_lightning/utilities/device_parser.py", line 78, in parse_gpu_ids
gpus = _normalize_parse_gpu_string_input(gpus)
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/pytorch_lightning/utilities/device_parser.py", line 131, in _normalize_parse_gpu_string_input
return int(s.strip())
ValueError: invalid literal for int() with base 10: 'None'
One minor thing - in casanovo/denovo/train_test.py, the f.suffix.lower() part is consistently throwing an error, saying str doesn't have attribute suffix. I have changed it to os.path.splitext(f)[1].lower() instead and it works fine.
The pretrained_excl_yeast.ckpt checkpoint has some missing/unexpected keys, it also differs in size compared to all other checkpoints (by a few bytes):
Missing key(s) in state_dict: "encoder.peak_encoder.sin_term", "encoder.peak_encoder.cos_term", "encoder.peak_encoder.int_encoder.weight", "decoder.charge_encoder.weight", "decoder.mass_encoder.sin_term", "decoder.mass_encoder.cos_term".
Unexpected key(s) in state_dict: "encoder.mz_encoder.sin_term", "encoder.mz_encoder.cos_term", "encoder.mz_encoder.linear.weight", "decoder.precursor_encoder.weight", "decoder.precursor_encoder.bias".
Hi,
the param 'reverse_peptide_seqs' in config.py is defined but not used anywhere else.
Btw, why we should reverse the peptide sequence when encoding them?
Thanks for your kind answer.
Hi,
I use pip install git+https://github.com/Noble-Lab/casanovo.git#egg=casanovo
code to install casanovo, Then I ran into some trouble, the following is my error message:
ERROR: Could not find a version that satisfies the requirement depthcharge-ms (unavailable) (from casanovo) (from versions: none)
ERROR: No matching distribution found for depthcharge-ms (unavailable)
I also tried installing from https://github.com/wfondrie/depthcharge
, But it doesn't seem to work properly. The following is the error message:
ModuleNotFoundError: No module named 'depthcharge.embed'
how can I solve this problem?
Thanks,
LeeLee
I failed to find the ‘depthcharge’ folder
Hi there,
I've been trying to run your code on Spyder through Anaconda, but I've been having trouble running the following line:
>>run casanovo --mode=denovo --model_path='pretrained_excl_clambacteria.ckpt' --test_data_path='dark_control_1.mgf' --config_path='config' --output_path='test.csv'
The following error is reported back:
**Traceback (most recent call last):
File "F:\Applications\Noble\casanovo\casanovo.py", line 3, in
from casanovo.denovo import train, test_evaluate, test_denovo
ModuleNotFoundError: No module named 'casanovo.denovo'; 'casanovo' is not a package**
I've noticed that some of the files being called appear to have different names to their current version on GitHub.
E.g. in casanovo.py, it tries to import 'train', 'test_evaluate', and 'test_denovo' from casanovo/denovo.
However, in this directory the files are named 'train_test', 'evaluate', and 'model'.
Having played around changing the relevant lines in this file to get it to work, it brought forth more import errors from other files.
Eventually 'fixing' them produces a circular import so I don't think what I've done is correct.
Do you know what might be causing this error? Am I correct in thinking these files are currently mis-named?
Any help would be greatly appreciated.
Best wishes,
Alex
It looks like if you don't specify --output_path=foo.tsv
then Casanovo fails with this error message:
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/casanovo/denovo/model.py", line 435, in on_test_epoch_end
with open(os.path.join(str(self.output_path),'casanovo_output.csv'), 'w') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'None/casanovo_output.csv'
FYI, the full output is below.
casanovo --mode=denovo --model_path=../../../data/22-07-02_weights/pretrained_excl_mouse.ckpt --test_data_path=mgf --config_path=config.yaml
INFO: De novo sequencing with Casanovo...
INFO: Created a temporary directory at /tmp/tmpfm17yd8s
INFO: Writing /tmp/tmpfm17yd8s/_remote_module_non_scriptable.py
INFO: Reading 1 files...
mgf/2022_01_13_HAB_timeseries_DDA_54_17-17.mgf: 53109spectra [00:18, 2879.99spectra/s]
/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py:287: LightningDeprecationWarning: Passing `Trainer(accelerator='ddp')` has been deprecated in v1.5 and will be removed in v1.7. Use `Trainer(strategy='ddp')` instead.
f"Passing `Trainer(accelerator={self.distributed_backend!r})` has been deprecated"
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/1
[W socket.cpp:401] [c10d] The server socket cannot be initialized on [::]:36057 (errno: 97 - Address family not supported by protocol).
[W socket.cpp:558] [c10d] The client socket cannot be initialized to connect to [localhost]:36057 (errno: 97 - Address family not supported by protocol).
[W socket.cpp:558] [c10d] The client socket cannot be initialized to connect to [localhost]:36057 (errno: 97 - Address family not supported by protocol).
INFO: Added key: store_based_barrier_key:1 to store for rank: 0
INFO: Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes.
----------------------------------------------------------------------------------------------------
distributed_backend=nccl
All distributed processes registered. Starting with 1 processes
----------------------------------------------------------------------------------------------------
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
Testing: 100% 52/52 [12:38<00:00, 9.69s/it]Traceback (most recent call last):
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/bin/casanovo", line 8, in <module>
sys.exit(main())
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/casanovo/casanovo.py", line 83, in main
denovo(test_data_path, model_path, config, output_path)
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/casanovo/denovo/train_test.py", line 246, in denovo
trainer.test(model_trained, loaders.test_dataloader())
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 911, in test
return self._call_and_handle_interrupt(self._test_impl, model, dataloaders, ckpt_path, verbose, datamodule)
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 685, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 954, in _test_impl
results = self._run(model, ckpt_path=self.tested_ckpt_path)
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1199, in _run
self._dispatch()
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1275, in _dispatch
self.training_type_plugin.start_evaluating(self)
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 206, in start_evaluating
self._results = trainer.run_stage()
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1286, in run_stage
return self._run_evaluate()
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1334, in _run_evaluate
eval_loop_results = self._evaluation_loop.run()
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/pytorch_lightning/loops/base.py", line 151, in run
output = self.on_run_end()
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py", line 134, in on_run_end
self._on_evaluation_epoch_end()
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py", line 241, in _on_evaluation_epoch_end
self.trainer.call_hook(hook_name)
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1501, in call_hook
output = model_fx(*args, **kwargs)
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/casanovo/denovo/model.py", line 435, in on_test_epoch_end
with open(os.path.join(str(self.output_path),'casanovo_output.csv'), 'w') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'None/casanovo_output.csv'
With --mode=denovo and pretrained human weights this spectrum:
BEGIN IONS
TITLE=crasher
RTINSECONDS=2800
PEPMASS=1551
CHARGE=4+
1552.40144197 301.23929755
1551.20606409 1051.24587242
1550.88090936 1053.2816881
END IONS
raises an IndexError at https://github.com/Noble-Lab/casanovo/blob/main/casanovo/data/datasets.py#L185
To trigger, the peaks need to be sufficiently close to each other and to PEPMASS.
We should implement the precursor mass filter in Casanovo by subtracting 1 from the peptide-level score if the peptide mass is outside of a user-specified range. This range should be specified in the yaml file in units of ppm. We can name it precursor_window
and have a default value of 50.
Hi,
I have tried two ways to run this project but both failed.
Method 1: casanovo --mode=eval --model_path='./casanovo_pretrained_model_weights/' --test_data_path='./Casanovo_preprocessed_data/' --config_path='../casanovo/config.py'
I got
INFO: Evaluating Casanovo...
Traceback (most recent call last):
File "/usr/local/bin/casanovo", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1128, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1053, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1395, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 754, in invoke
return __callback(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/casanovo/casanovo.py", line 37, in main
test_evaluate(test_data_path, model_path, config_path)
File "/usr/local/lib/python3.8/dist-packages/casanovo/denovo/train_test.py", line 146, in test_evaluate
model_trained = Spec2Pep().load_from_checkpoint(
File "/usr/local/lib/python3.8/dist-packages/casanovo/denovo/model.py", line 103, in __init__
self.encoder = SpectrumEncoder(
File "/usr/local/lib/python3.8/dist-packages/depthcharge/components/transformers.py", line 60, in __init__
layer = torch.nn.TransformerEncoderLayer(
TypeError: __init__() got an unexpected keyword argument 'batch_first'
Method 2: python casanovo.py --mode=eval --model_path='../../data/casanovo_pretrained_model_weights/' --test_data_path='../../data/Casanovo_preprocessed_data/' --config_path='config.py'
I got:
Traceback (most recent call last):
File "casanovo.py", line 3, in <module>
from casanovo.denovo import train, test_evaluate, test_denovo
File "/home/bio/casanovo-main/casanovo/casanovo.py", line 3, in <module>
from casanovo.denovo import train, test_evaluate, test_denovo
ModuleNotFoundError: No module named 'casanovo.denovo'; 'casanovo' is not a package
Implement if not
At least adapt the tests from Depthcharge. Other tests to be written TBD.
Currently predicted PSMs are identified by the index of their spectra in the data loader. However, when predicting from multiple input files, this doesn't include information from which file the spectra came, pretty much making the index useless.
Can we track the origin of the input spectra, i.e. using filename and scan number? This probably needs to be modified in the index in depthcharge, and should then be passed through when predicting (but no need to move that information into a tensor on the GPU) so that it is available when writing the output results.
I tried to use Casanovo to make predictions on an MGF file containing 31,078 spectra, and it ran out of memory. Is there anything I can do to mitigate this problem, other than breaking up the input file into small pieces or switching to a different machine?
casanovo --mode=denovo --model_path=/net/noble/vol1/home/noble/proj/2022_varun_ls-casanovo/data/22-07-02_weights/pretrained_excl_mouse.ckpt --test_data_path=20190227_231_15%_1 --output_path=20190227_231_15%_1 --config_path=config.yaml
INFO: De novo sequencing with Casanovo...
INFO: Created a temporary directory at /tmp/tmpzqps6s6h
INFO: Writing /tmp/tmpzqps6s6h/_remote_module_non_scriptable.py
INFO: Reading 1 files...
20190227_231_15%_1/20190227_231_15%_1.mgf: 31078spectra [00:08, 3647.09spectra/s]
/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py:287: LightningDeprecationWarning: Passing `Trainer(accelerator='ddp')` has been deprecated in v1.5 and will be removed in v1.7. Use `Trainer(strategy='ddp')` instead.
f"Passing `Trainer(accelerator={self.distributed_backend!r})` has been deprecated"
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/1
[W socket.cpp:401] [c10d] The server socket cannot be initialized on [::]:55938 (errno: 97 - Address family not supported by protocol).
[W socket.cpp:558] [c10d] The client socket cannot be initialized to connect to [localhost]:55938 (errno: 97 - Address family not supported by protocol).
[W socket.cpp:558] [c10d] The client socket cannot be initialized to connect to [localhost]:55938 (errno: 97 - Address family not supported by protocol).
INFO: Added key: store_based_barrier_key:1 to store for rank: 0
INFO: Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes.
----------------------------------------------------------------------------------------------------
distributed_backend=nccl
All distributed processes registered. Starting with 1 processes
----------------------------------------------------------------------------------------------------
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
Testing: 35% 11/31 [02:46<03:05, 9.26s/it]Traceback (most recent call last):
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/bin/casanovo", line 8, in <module>
sys.exit(main())
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/casanovo/casanovo.py", line 83, in main
denovo(test_data_path, model_path, config, output_path)
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/casanovo/denovo/train_test.py", line 246, in denovo
trainer.test(model_trained, loaders.test_dataloader())
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 911, in test
return self._call_and_handle_interrupt(self._test_impl, model, dataloaders, ckpt_path, verbose, datamodule)
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 685, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 954, in _test_impl
results = self._run(model, ckpt_path=self.tested_ckpt_path)
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1199, in _run
self._dispatch()
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1275, in _dispatch
self.training_type_plugin.start_evaluating(self)
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 206, in start_evaluating
self._results = trainer.run_stage()
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1286, in run_stage
return self._run_evaluate()
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1334, in _run_evaluate
eval_loop_results = self._evaluation_loop.run()
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/pytorch_lightning/loops/base.py", line 145, in run
self.advance(*args, **kwargs)
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py", line 110, in advance
dl_outputs = self.epoch_loop.run(dataloader, dataloader_idx, dl_max_batches, self.num_dataloaders)
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/pytorch_lightning/loops/base.py", line 145, in run
self.advance(*args, **kwargs)
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 122, in advance
output = self._evaluation_step(batch, batch_idx, dataloader_idx)
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 213, in _evaluation_step
output = self.trainer.accelerator.test_step(step_kwargs)
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/pytorch_lightning/accelerators/accelerator.py", line 247, in test_step
return self.training_type_plugin.test_step(*step_kwargs.values())
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/pytorch_lightning/plugins/training_type/ddp.py", line 450, in test_step
return self.lightning_module.test_step(*args, **kwargs)
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/casanovo/denovo/model.py", line 403, in test_step
pred_seqs, scores = self.predict_step(batch)
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/casanovo/denovo/model.py", line 188, in predict_step
return self(batch[0], batch[1])
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/casanovo/denovo/model.py", line 163, in forward
scores, tokens = self.greedy_decode(spectra, precursors)
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/casanovo/denovo/model.py", line 212, in greedy_decode
memories, mem_masks = self.encoder(spectra)
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/depthcharge/components/transformers.py", line 105, in forward
return self.transformer_encoder(peaks, src_key_padding_mask=mask), mask
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/torch/nn/modules/transformer.py", line 238, in forward
output = mod(output, src_mask=mask, src_key_padding_mask=src_key_padding_mask)
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/torch/nn/modules/transformer.py", line 456, in forward
src_mask if src_mask is not None else src_key_padding_mask,
RuntimeError: CUDA out of memory. Tried to allocate 714.00 MiB (GPU 0; 7.79 GiB total capacity; 2.46 GiB already allocated; 632.94 MiB free; 3.65 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Hi Melih,
I think I have solved the install issue.
I was trying to install it with Python 3.10 and it appears torch is not fully supported on it yet and was going unreported when trying to install through Spyder.
Downgrading Python to 3.9 and installing through the command line correctly installed it and the packages are called perfectly.
I have however had issues trying to load in my datasets.
The example command is:
casanovo --mode=denovo --model_path='path/to/pretrained' --test_data_path='path/to/test' --config_path='path/to/config' --output_path='path/to/output'
I've tried different combinations of the following for the paths but have had no luck.
Could you please give some guidance on the best way to call the script with the correct arguments on Windows?
Best wishes,
Alex
A couple of examples:
casanovo --mode=denovo --model_path='F:\PhD_Data\Casanovo_Data' --test_data_path='F:\PhD_Data\Casanovo_Data' --config_path=F:\PhD_Data\Casanovo_Data' --output_path='F:\PhD_Data\Casanovo_Data'
casanovo --mode=denovo --model_path='F:\PhD_Data\Casanovo_Data\pretrained_excl_clambacteria.ckpt' --test_data_path='F:\PhD_Data\Casanovo_Data\dark_control_1.mgf' --config_path='F:\PhD_Data\Casanovo_Data\config_params.py' --output_path='F:\PhD_Data\Casanovo_Data\Output.csv'
casanovo --mode=denovo --model_path=F:\PhD_Data\Casanovo_Data\pretrained_excl_clambacteria.ckpt --test_data_path=F:\PhD_Data\Casanovo_Data\dark_control_1.mgf --config_path=F:\PhD_Data\Casanovo_Data\config_params.py --output_path=F:\PhD_Data\Casanovo_Data\Output.csv
casanovo --mode=denovo --model_path=F:/PhD_Data/Casanovo_Data/pretrained_excl_clambacteria.ckpt --test_data_path=F:/PhD_Data/Casanovo_Data/dark_control_1.mgf --config_path=F:/PhD_Data/Casanovo_Data/config_params.py --output_path=F:/PhD_Data/Casanovo_Data/Output.csv
Originally posted by @alexmunroclark in #2 (comment)
Hello again,
For some reason, the on_train_epoch_end
hook is called before training is complete. I am trying to train with a pretty large dataset (~1.2M spectra between the training and validation sets, with 1093774 spectra (17,091 batches of 64) for training and 123930 spectra (1,937 batches of 64) for validation). For some reason, casanovo is thinking that training is done after only 11,927 batches (which is significantly less than the 17,091 expected) and so calls on_train_epoch_end
, which looks for a history to add the train loss to and fails:
...
/home/ubuntu/smsnet_val_data/1049.mgf: 46spectra [00:00, 2628.76spectra/s]
/home/ubuntu/smsnet_val_data/1276.mgf: 62spectra [00:00, 2859.35spectra/s]
/home/ubuntu/smsnet_val_data/1073.mgf: 1spectra [00:00, 4156.89spectra/s]
/home/ubuntu/smsnet_val_data/1224.mgf: 4spectra [00:00, 5257.67spectra/s]
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
Global seed set to 454
initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/1
INFO: Added key: store_based_barrier_key:1 to store for rank: 0
INFO: Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes.
----------------------------------------------------------------------------------------------------
distributed_backend=nccl
All distributed processes registered. Starting with 1 processes
----------------------------------------------------------------------------------------------------
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
| Name | Type | Params
---------------------------------------------
0 | encoder | SpectrumEncoder | 18.9 M
1 | decoder | PeptideDecoder | 28.4 M
2 | softmax | Softmax | 0
3 | celoss | CrossEntropyLoss | 0
---------------------------------------------
47.3 M Trainable params
0 Non-trainable params
47.3 M Total params
189.387 Total estimated model params size (MB)
/home/ubuntu/mambaforge/envs/denovo/lib/python3.9/site-packages/pytorch_lightning/callbacks/model_checkpoint.py:631: UserWarning: Checkpoint directory /home/ubuntu/checkpoints exists and is not empty.
rank_zero_warn(f"Checkpoint directory {dirpath} exists and is not empty.")
Epoch 0: 0%| | 0/19028 [00:00<?, ?it/s]/home/ubuntu/mambaforge/envs/denovo/lib/python3.9/site-packages/pytorch_lightning/utilities/data.py:59: UserWarning: Trying to infer the `batch_size` from an ambiguous collection. The batch size we found is 64. To avoid any miscalculations, use `self.log(..., batch_size=batch_size)`.
warning_cache.warn(
Epoch 0: 0%| | 1/19028 [00:01<6:18:18, 1.19s/it, loss=0.868]INFO: Reducer buckets have been rebuilt in this iteration.
Epoch 0: 50%|█████████████████████████████████████████████████████████▏ | 9549/19028 [30:34<30:20, 5.21it/s, loss=1.03]Epoch 0: 50%|█████████████████████████████████████████████████████████▎ | 9575/19028 [30:39<30:16, 5.20it/s, loss=1.04]
Epoch 0: 50%|█████████████████████████████████████████████████████████▎ | 9576/19028 [30:40<30:16, 5.20it/s, loss=1.04]
Epoch 0: 50%|█████████████████████████████████████████████████████████▍ | 9577/19028 [30:40<30:16, 5.20it/s, loss=1.03]
Epoch 0: 50%|█████████████████████████████████████████████████████████▍ | 9582/19028 [30:41<30:15, 5.20it/s, loss=1.01]
Epoch 0: 50%|█████████████████████████████████████████████████████████▍ | 9583/19028 [30:41<30:15, 5.20it/s, loss=1.01]
Epoch 0: 50%|█████████████████████████████████████████████████████████▍ | 9584/19028 [30:41<30:14, 5.20it/s, loss=0.99]
Epoch 0: 50%|████████████████████████████████████████████████████████▉ | 9585/19028 [30:42<30:14, 5.20it/s, loss=0.986]
Epoch 0: 50%|████████████████████████████████████████████████████████▉ | 9586/19028 [30:42<30:14, 5.20it/s, loss=0.975]
Epoch 0: 63%|██████████████████████████████████████████████████████████████████████▏ | 11929/19028 [39:47<23:40, 5.00it/s, loss=0.932]Traceback (most recent call last):
File "/home/ubuntu/mambaforge/envs/denovo/bin/casanovo", line 8, in <module>
sys.exit(main())
File "/home/ubuntu/mambaforge/envs/denovo/lib/python3.9/site-packages/click/core.py", line 1128, in __call__
return self.main(*args, **kwargs)
File "/home/ubuntu/mambaforge/envs/denovo/lib/python3.9/site-packages/click/core.py", line 1053, in main
rv = self.invoke(ctx)
File "/home/ubuntu/mambaforge/envs/denovo/lib/python3.9/site-packages/click/core.py", line 1395, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/ubuntu/mambaforge/envs/denovo/lib/python3.9/site-packages/click/core.py", line 754, in invoke
return __callback(*args, **kwargs)
File "/home/ubuntu/mambaforge/envs/denovo/lib/python3.9/site-packages/casanovo/casanovo.py", line 32, in main
train(train_data_path, val_data_path, model_path, config_path)
File "/home/ubuntu/mambaforge/envs/denovo/lib/python3.9/site-packages/casanovo/denovo/train_test.py", line 134, in train
trainer.fit(model, train_loader.train_dataloader(), val_loader.val_dataloader())
File "/home/ubuntu/mambaforge/envs/denovo/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 740, in fit
self._call_and_handle_interrupt(
File "/home/ubuntu/mambaforge/envs/denovo/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 685, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/home/ubuntu/mambaforge/envs/denovo/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 777, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "/home/ubuntu/mambaforge/envs/denovo/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1199, in _run
self._dispatch()
File "/home/ubuntu/mambaforge/envs/denovo/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1279, in _dispatch
self.training_type_plugin.start_training(self)
File "/home/ubuntu/mambaforge/envs/denovo/lib/python3.9/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 202, in start_training
self._results = trainer.run_stage()
File "/home/ubuntu/mambaforge/envs/denovo/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1289, in run_stage
return self._run_train()
File "/home/ubuntu/mambaforge/envs/denovo/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1319, in _run_train
self.fit_loop.run()
File "/home/ubuntu/mambaforge/envs/denovo/lib/python3.9/site-packages/pytorch_lightning/loops/base.py", line 145, in run
self.advance(*args, **kwargs)
File "/home/ubuntu/mambaforge/envs/denovo/lib/python3.9/site-packages/pytorch_lightning/loops/fit_loop.py", line 234, in advance
self.epoch_loop.run(data_fetcher)
File "/home/ubuntu/mambaforge/envs/denovo/lib/python3.9/site-packages/pytorch_lightning/loops/base.py", line 151, in run
output = self.on_run_end()
File "/home/ubuntu/mambaforge/envs/denovo/lib/python3.9/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 298, in on_run_end
self.trainer.call_hook("on_train_epoch_end")
File "/home/ubuntu/mambaforge/envs/denovo/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1501, in call_hook
output = model_fx(*args, **kwargs)
File "/home/ubuntu/mambaforge/envs/denovo/lib/python3.9/site-packages/casanovo/denovo/model.py", line 425, in on_train_epoch_end
self._history[-1]["train"] = train_loss
IndexError: list index out of range
Epoch 0: 63%|██████▎ | 11929/19028 [39:51<23:43, 4.99it/s, loss=0.932]
I tried wrapping self._history[-1]["train"] = train_loss
in a try
and except
where if the self._history
is not of length at least 1, this step is skipped. However, somehow this skips validation entirely and goes to the next training epoch without validation:
/home/ubuntu/mambaforge/envs/denovo/lib/python3.9/site-packages/pytorch_lightning/utilities/data.py:59: UserWarning: Trying to infer the `batch_size` from an ambiguous collection. The batch size we found is 64. To avoid any miscalculations, use `self.log(..., batch_size=batch_size)`.
warning_cache.warn(
Global seed set to 454
Epoch 0: 0%| | 1/19028 [00:01<6:21:44, 1.20s/it, loss=3.5, lr=0.000]INFO: Reducer buckets have been rebuilt in this iteration.
Epoch 0: 63%|███████████████████████████████████████████████████████████████▎ | 11929/19028 [39:49<23:41, 4.99it/s, loss=2.17, lr=5.96e-5]INFO: ---------------------------------------------------------------------------------------------------------
INFO: Epoch | Train Loss | Valid Loss | Valid AA precision | Valid AA recall | Valid Peptide recall
INFO: ---------------------------------------------------------------------------------------------------------
INFO: 0 | 2.398110 | 3.474908 | 0.000851 | 0.007873 | 0.000000
Epoch 1: 4%|████▍ | 829/19028 [02:41<59:12, 5.12it/s, loss=2.22, lr=6.37e-5]
I also added the learning rate to this printout just to make sure it hadn't stopped learning or something funky with learning rate had made it stop training. I am running with the same warmup and max iteration configuration that you are running with (100000, 600000). Is there anything that you have encountered that would suggest where the problem is coming from?
Step by step guide to de novo sequencing with Casanovo with a sample mgf file.
When running the provided following command:
casanovo --mode=denovo --model=22-07-02_weights/pretrained_excl_mouse.ckpt --peak_path=sample_data/sample_preprocessed_spectra.mgf --config=casanovo/config.yaml
This error shows:
File "D:\miniconda3\lib\site-packages\torch\nn\modules\module.py", line 1497, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for Spec2Pep:
size mismatch for decoder.aa_encoder.weight: copying a param with shape torch.Size([25, 512]) from checkpoint, the shape in current model is torch.Size([29, 512]).
size mismatch for decoder.final.weight: copying a param with shape torch.Size([25, 512]) from checkpoint, the shape in current model is torch.Size([29, 512]).
size mismatch for decoder.final.bias: copying a param with shape torch.Size([25]) from checkpoint, the shape in current model is torch.Size([29]).
Did you recently updated depthcharge-ms, the pretrained weights, the sample .mgf file or config.yaml?
Shall I try any older version of any of these files? Thanks!
I just tried to run the "small walkthrough" that's in the Casanovo readme, but it didn't work. I thought it was strange that the example makes no mention of the yaml configuration file. Sure enough, when I ran the example, I got an error message saying it couldn't find that file:
casanovo --mode=denovo --model_path=../../../data/22-07-02_weights/pretrained_excl_mouse.ckpt --test_data_path=mgf
/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/torch/cuda/__init__.py:83: UserWarning: CUDA initialization: The NVIDIA driver on your system is too old (found version 9010). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:109.)
return torch._C._cuda_getDeviceCount() > 0
Traceback (most recent call last):
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/bin/casanovo", line 8, in <module>
sys.exit(main())
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/casanovo/casanovo.py", line 58, in main
with open(abs_path) as f:
FileNotFoundError: [Errno 2] No such file or directory: '/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/casanovo/config.yaml'
We need to update the README to describe how to set up the config file.
See validation data in the below line.
casanovo/casanovo/denovo/train_test.py
Line 25 in a7cefa2
Hi, I failed to pip install casanovo with many wrongs.
Is there a method that I can use python to run this project?
This can wait until we have release notes and versions for Casanovo
Make sure that the YAML config values are parsed as the correct type. I.e. learning rate 5e-4
should be a float, not a string.
The documentation should have detailed information about how to configure casanovo.
Bullet list of what’s added and when bottom of README.md e.g. release 1.1 22-06-03: - config file format changed etc. ...
Greetings,
Firstly, I would like to thank you for providing this open source tool. I am currently interested to perform de novo sequencing without evaluation. Based on your github page, it was shown to run the command of:
casanovo --mode=denovo --model_path='path/to/pretrained' --test_data_path='path/to/test' --config_path='path/to/config' --output_path='path/to/output'
Regards,
Ben
Hi @melihyilmaz,
One minor thing - in casanovo/denovo/train_test.py
, the f.suffix.lower()
part is consistently throwing an error, saying str
doesn't have attribute suffix
. I have changed it to os.path.splitext(f)[1].lower()
instead and it works fine.
On another note, I am trying to train on a new dataset and am running into an issue with how the number of batches are calculated. I adjust the training and validation batch sizes to both be 64 in the config file.
However, when I do this with the attached files as train and validation data respectively - command is
casanovo --mode=train --model_path=/home/ubuntu/darkspectra/casanovo_pretrained_model_weights/pretrained_excl_human.ckpt --train_data_path=/home/ubuntu/play_train_data/ --val_data_path=/home/ubuntu/play_val_data
where I stick train.mgf.txt
in play_train_data
and val.mgf.txt
in play_val_data
(and rename them back to .mgf
), I am seeing that the progress bar computes the number of train batches based on the number of combined spectra between the two files ((1698+713)/64 = 38, as opposed to the desired 27).
Can you replicate this behavior with these files? Thank you very much.
EDIT: It doesn't seem to be an issue with the number of batches like I suspected but rather with the validation data, since I tried swapping out the validation data with just the same training data file and it seems to work...
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.