flatironinstitute / deepblast Goto Github PK

View Code? Open in Web Editor NEW

99.0 6.0 17.0 58.07 MB

Neural Networks for Protein Sequence Alignment

License: BSD 3-Clause "New" or "Revised" License

Python 84.25% Makefile 0.18% Jupyter Notebook 15.36% Shell 0.21%

structural-alignments neural-networks sequence-alignment language-modeling protein-sequences protein protein-structure

deepblast's Introduction

DeepBLAST

Learning protein structural similarity from sequence alone. Our preprint can be found here

DeepBLAST is a neural-network based alignment algorithm that can estimate structural alignments. And it can generate structural alignments that are nearly identical to state-of-the-art structural alignment algorithms.

Installation

DeepBLAST can be installed from pip via

pip install deepblast

To install from the development branch run

pip install git+https://github.com/flatironinstitute/deepblast.git

Downloading pretrained models and data

The pretrained DeepBLAST model can be downloaded here.

The TM-align structural alignments used to pretrain DeepBLAST can be found below

See the Malisam and Malidup websites to download their datasets.

Getting started

See the wiki on how to use DeepBLAST and TM-vec for remote homology search and alignment. If you have questions on how to use DeepBLAST and TM-vec, feel free to raise questions in the discussions section. If you identify any potential bugs, feel free to raise them in the issuetracker

Citation

If you find our work useful, please cite us at

@article{morton2020protein,
  title={Protein Structural Alignments From Sequence},
  author={Morton, Jamie and Strauss, Charlie and Blackwell, Robert and Berenberg, Daniel and Gligorijevic, Vladimir and Bonneau, Richard},
  journal={bioRxiv},
  year={2020},
  publisher={Cold Spring Harbor Laboratory}
}

@article{hamamsy2022tm,
  title={TM-Vec: template modeling vectors for fast homology detection and alignment},
  author={Hamamsy, Tymor and Morton, James T and Berenberg, Daniel and Carriero, Nicholas and Gligorijevic, Vladimir and Blackwell, Robert and Strauss, Charlie EM and Leman, Julia Koehler and Cho, Kyunghyun and Bonneau, Richard},
  journal={bioRxiv},
  pages={2022--07},
  year={2022},
  publisher={Cold Spring Harbor Laboratory}
}

deepblast's People

Contributors

Stargazers

Watchers

Forkers

blackwer mortonjt vgligorijevic konstin bioembeddings sjanssen2 athbaltzis wouterboomsma stjordanis tanxiaoqin888 tymor22 fymue diegolivo yashasdevasurmutt wook2014 yichuan0712

deepblast's Issues

Dimensions need untangling

The predicted alignments and the ground-truth alignments are currently tranpositions of each other.
The alignment will work for simulations, but will fail on real datasets due to this.

We will need to untangle the dimensions (i.e. always choosing the longest sequence to be first)

Edge edge : single sequence alignment

We need to eventually re-enable aligning single pairs of sequences instead of the usual batching.
Right now, the interface is a little unintuitive, mainly because pytorch-lightning can't deal with PackSequence objects.

See TestAlignmentModel.test_alignment in the test_alignment.py file

wrong return type

Hi Jamie,
I try to check the traditional hardmax NW but code execution fails unless I remove the , _ on the left hand side. I think the torch.max function only returns a single Tensor, doesn't it?

deepblast/deepblast/ops.py

Line 7 in ab48e4f

M, _ = torch.max(X)

Enable local alignments

This will essentially verify that we can perform Smith-Waterman instead of Needleman-Wunsch.

I believe that we can just start by setting these constants to zero
https://github.com/mortonjt/garfunkel/blob/master/deepblast/nw.py#L31

Then make sure all of the gradient tests pass

How to solve the following problems?

Hello，I encountered the following problem when running the code. How can I solve it?
1.I am running deepblast-train with an error:FileNotFoundError: [Errno 2] File b'VALID_PAIRS' does not exist: b'VALID_PAIRS'。
2.I am running deepblast-evaluate with an error:FileNotFoundErro： deepblast-evaluate: error: the following arguments are required: --train-pairs, --test-pairs, --valid-pairs, -o/--output-directory
3..I am running tm with an error:FileNotFoundErro：ValueError: invalid literal for int() with base 10: 'TEST_PAIRS'

Percent identity being miscalculated

Below is an example - the percent identity should be close to 0.3, but it is for some reason reporting 0.02

Ground truth

SVHTLLDEKHETLDSEWEKLVRDAMTSGVSKKQFREFLDYQKWRKSQ
I----------------------------FTYGELQRMQEKERNKGQ

Prediction

SVHTLLDEKHETLDSEWEKLVRDAMTSGVSKKQFREFLDYQKWRKSQ
-I-------F------------------T--YGELQRMQEKERNKGQ

Here is another example with 9 tps reported and perc_id=0.2

Ground truth

F--GD--D--------QN-PYTESVDILEDLVIEFITEMTHKAMSI
ISHLVIMHEEGEVDGKAIPDLTAPVSAVQAAVSNLVRVGKETVQTT

Prediction

-FG---D------D--QN-PYTESVDILEDLVIEFITEMTHKAMSI
ISHLVIMHEEGEVDGKAIPDLTAPVSAVQAAVSNLVRVGKETVQTT

Add travis

This repository is beginning to get unwieldly. We need to add in unittests and CI ASAP

Deprecation warning on alignment scores

We will need to fix this at some point

/home/jmorton/research/gert/garfunkel/scripts/deepblast-search:43: DeprecationWarning: np.asscalar(a) is deprecated since NumPy v1.16, use a.item() instead
  aln_score = np.asscalar(A[i].detach().cpu().numpy())

Cross validation metrics

We will want to be able to represent the following

Alignment accuracy (number of correctly identified matches and mismatches)
Soft alignment loss

Null models

We need to figure out a way to return some form of alignment score / confidence on the sequence alignment

Some potential leads are listed below

Using CNNs to estimate the gap / match scores

It maybe worthwhile to investigate using CNNs instead of RNNs to estimate the match / gap scores.

This would happen around this line
https://github.com/mortonjt/garfunkel/blob/master/deepblast/alignment.py#L43

CC @VGligorijevic feel free to create a PR to test this out.

The simulation tests can be run here : https://github.com/mortonjt/garfunkel/blob/master/ipynb/simulation-benchmark.ipynb
It maybe worth testing out with the beta-lactamase HMMs, since those are particularly gappy.

Weights can't be downloaded with python (User-Agent blocked)

The webserver blocks python-urllib as user agent for downloading the weights:

$ curl -H "User-Agent: Python-urllib/3.8" https://users.flatironinstitute.org/jmorton/public_www/deepblast-public-data/checkpoints/deepblast-lstm4x.pt
error code: 1010
$ curl -H "User-Agent: not-python/3.8" https://users.flatironinstitute.org/jmorton/public_www/deepblast-public-data/checkpoints/deepblast-lstm4x.pt
<html>
<head><title>301 Moved Permanently</title></head>
<body>
<center><h1>301 Moved Permanently</h1></center>
<hr><center>nginx/1.16.1</center>
</body>
</html>

It's not hard to fake a different user agent, but I don't think that the standard python snippet for downloading the weights should be blocked.

Test cases

Unittests from Malisam / Malidup to demonstrate end-to-end alignment

subscript error

shouldn't it be A[i - 1, j - 1] ?

deepblast/deepblast/nw.py

Line 58 in ab48e4f

maxargs[y] = A[i - j, 1 - 1] + V[i, j - 1] # y

Add priors for match / gap scores

We need a way to deal with / differentiate between insertions at the ends vs in the middle.
We also need to figure out how to resolve potential identifiability issues between match / gap scoring.

Make GPU + CPU versions compatible

We'll need to standardize the CPU / GPU version of needleman-wunsch.

Make sure that batching can be handled on the CPU (a hacked version is ok, will need to make sure that dimensions aren't an issue)
Make sure that the appropriate flags checking for devices are in place.

Remove enable_grad statements

There are a couple of places where the gradients are enabled
https://github.com/mortonjt/garfunkel/blob/master/deepblast/alignment.py#L72
https://github.com/mortonjt/garfunkel/blob/master/deepblast/alignment.py#L84

These statements may not be needed (and may confound the validation step). We probably should remove these statements.

Off by 1 errors due to gaps at the ends

We have a bunch of off-by-1 errors that are being detected in the unittests. This could be one of the reasons why we can't detect gaps very well. Below are 2 functions that are suffering from this.

deepblast.dataset.utils.states2edges
deepblast.dataset.utils.states2matrix

We need to make sure that the right alignment coordinates are being returned.

EDIT : I think these errors are only arising when there are gaps at the ends of the sequences.

deepblast-search fails with AttributeError: 'function' object has no attribute 'score'

Hi,

I'm interested in using DeepBLAST as part of a search workflow, so I tried to run scripts/deepblast-search, but unfortunately it crashes:

$ python scripts/deepblast-search --query-fasta QUERY.fasta --db-fasta DB.fasta --load-from-checkpoint deepblast-lstm4x.pt --output-file output
args Namespace(batch_size=10, db_fasta='DB.fasta', gpu=None, load_from_checkpoint='deepblast-lstm4x.pt', num_workers=1, output_file='output', query_fasta='QUERY.fasta')
0it [00:09, ?it/s]
Traceback (most recent call last):
  File "scripts/deepblast-search", line 63, in <module>
    main(hparams)
  File "scripts/deepblast-search", line 38, in main
    A = model.align.score(seqs, order)
AttributeError: 'function' object has no attribute 'score'

Steps to reproduce (crashes both with and without a GPU):

git clone https://github.com/flatironinstitute/deepblast
cd deepblact
virtualenv .venv
. .venv/bin/activate
pip install -e .
pip install biopython
wget https://users.flatironinstitute.org/jmorton/public_www/deepblast-public-data/checkpoints/deepblast-lstm4x.pt
wget https://raw.githubusercontent.com/soedinglab/MMseqs2/master/examples/QUERY.fasta
wget https://raw.githubusercontent.com/soedinglab/MMseqs2/master/examples/DB.fasta
python scripts/deepblast-search --query-fasta QUERY.fasta --db-fasta DB.fasta --load-from-checkpoint deepblast-lstm4x.pt --output-file output

Edge case in collate_f

There is something weird going in in the collate_f function. The output alignment matrix should be 4 x 5 , but we are getting a 5 x 5 matrix.

This is in the TestAlignmentModel.test_collate_alignment method in dataset.test_util.py

Add filtering parameters to command line

Just a reminder to add tm_threshold and max_len as command line options

weird validation results

Sometimes, when clip-ends is disabled, we get the following

This is weird, and likely has to do with the counters for logging (i.e. multiple measurements within a single step).

Length bias in scores

I'm interested in using DeepBLAST for finding homologous pairs, so I need to rank alignments for a given query.

Based on #78, I built a script and a simple set for evaluating my code (I found out that the ValueError I initally got in #78 goes away when switching from CPU to GPU) . For both db and query, the dataset has 5 random sequences each for 10 randomly picked CATH superfamilies from the 20% redundancy reduced set of CATH. I found that even with with the norm_score, scores are biased towards long sequences, so that the longest or second longest sequence is generally considered the closest hit.

With scipy, I got a Pearson's correlation coefficient of 0.53 to 0.94 with different CATH subsets. I've also plotted each query against the mean norm_score against all db sequences, which looks similar across different sets:

I've used this script and this cath-db.fasta and this cath-query.fasta for the plot. I used the DeepBLAST version I got by merging #78 and #87.

Do you have any thoughts on how to obtain a score that is independent of the sequence length?

Sequences don't quite look right

Below is an example alignment

Ground truth

XGGREGVLKKLRAVENELHYNKS--------LLEEVKDELQKMRQL-----------------------------------------------------------------------------------------------------------------X
X----LTEEQIAEFKEAFSLFDKDGDGTITTKELGT----------VMRSLGQNPTEAELQDMINEVDADGNGTIDFPEFLTMMARKMKDTDSEEEIREAFRVFDKDGNGYISAAELRHVMTNLGEKLTDEEVDEMIREADIDGDGQVNYEEFVQMMTAX

Prediction

-------------------------------XGGR--E--GV---L--K----KL-R--AV---E---N-E--L---HY--NK---SL-L--E-E-V--K--D--E-L--Q--KM----RQ---L--X-A---AA--A--AA----A---A--AA--A--A---A-A--A--A-A-A---A---A--A-A-A--A--A-A--A-A-A-A-A
XLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNP-TEAELQDMINEV-DADGNGTIDFPEFLTMMARKMKDTDSEEEIREAFRVFDKDGNGYISAAELRHVMTNLGE-KLTDEEVDE-MIRE-ADIDGDGQVNYEEFVQ-MMTAXAAAAA-AA-AAAAA-AAAAAAAAA-AAA-AAA-AAAAAAAAA-AAAAAAAAAAAAAAAA

There are two things that look weird here.

First, there are a lot of repeats in the prediction that shouldn't be there (probably a leftover visualization bug from padding).

Second, it doesn't look like we are handling gaps very well. This probably should be resolved with affine gaps.

Add support for asv

This can help with benchmarking

https://asv.readthedocs.io/en/stable/

Speed up Differentiating Dynamic programming algorithm

Once we have tested to make sure that the Viterbi algorithm works, we will need to start thinking about speed ups (since it will be the botteneck.

The current options

Look into torchscript to optimize over the for loops (easiest route)
Enable batching to vectorize across batches
Look into the ATen library to write the for loops in C
Investigate to see if it is possible to further vectorize these operations (hard)

More language models

It would be nice to be able to plug in more language models. Below are some candidates to add

[ ] Bert
[ ] Seqvec
[ ] One-hot encoding (related to #29)

Leave a comment if interested in contributing these models (since many of these already have implementations).

Refactor global constants

We consistently use x, y and m states as constants, but keep on redefining them. We'll want to make sure that this is only defined once.

Simulations

We need a way to sanity check our models. Namely through generating sequences from PFam families

This is being addressed by #27

Length bias

We haven't investigated how much the lengths will bias / impact the estimates of the NNs.
This maybe worthwhile to look into (since we know from deepfri that this could cause issues).
Also worth looking into how paired HMMs handle differing lengths.

Padding is off

It looks like the alignment matrix isn't being properly cropped. Below is an example

You can see that the ends of the match and gap embeddings aren't being removed correctly. We see this in alignment.py within the traceback function (which also likely appears in forward method).

I suspect that this is arising from language model -- where there is too much padding being put down.

Loading a pretrained model

Hello,
I am trying to run the "Loading the model" example code you provided on the github page. I am trying to run the example using the downloadable model "deepblast-lstm4x.pt":

#Load the pretrained model
model = LightningAligner.load_from_checkpoint('/home/deepblast/deepblast-lstm4x.pt')

But the LigningAligner cannot read that model and I get an AttributeError.

What is the correct way of running this code example? How do I call your model correctly?
Thanks
Olivia

Search functionality

We need to have some basic search functionality, in particular searching pairs of fasta files.

Add another attention layer

Attention may help with learning match / gap scores.

wrong DP matrix init?

I assume you intend to compute global pairwise alignments, i.e. Needleman Wunsch. Implicitly, you are initializing your first row and first column with 0, which I think is not correct. Traditionally, it is initialized as i * gapcost (c.f. http://rna.informatik.uni-freiburg.de/Teaching/index.jsp?toolName=Needleman-Wunsch#), in your code that should translate to V[i-1, 0] + A[i-1, 0]. Otherwise, leading gaps in both sequences do NOT get penalized and it would be half way towards an "end gap free" algorithm.

deepblast/deepblast/nw.py

Lines 89 to 93 in ab48e4f

 V = new(N + 1, M + 1).zero_() # N x M 

 Q = new(N + 2, M + 2, 3).zero_() # N x M x S 

 Q[N + 1, M + 1] = 1 

 for i in range(1, N + 1): 

 for j in range(1, M + 1):

What's missing for a real "end gap free" implementation is that you find your global optimum as the max in the last col and row of your DP matrix.
Therefore, I am confused about what version you really want to implement here?

Reduce memory footprint

The current memory footprint is quite high - the batch size maxes out at 32.
We could significantly speed things up if we could reduce the memory footprint.
One possibility is to utilize more sparse matrices in the internal computations -- particularly for the masking and alignment matrices.
See https://stackoverflow.com/a/56887077/1167475

Multi-node GPU support is still outstanding

It looks like multi-node GPU support is still an outstanding task - if I execute the following script to run on 4 nodes (16 gpus)

workers=40
nodes=4
layers=2
RESULTS=results/full_run_w${workers}_n${nodes}_l${layers}
mkdir -p $RESULTS
deepblast-train \
    --train-pairs $DIR/train.txt \
    --test-pairs $DIR/test.txt \
    --valid-pairs $DIR/valid.txt \
    --output-directory $RESULTS \
    --nodes $nodes \
    --num-workers $workers \
    --learning-rate 1e-5 \
    --visualization-fraction 0.001 \
    --batch-size $((64 * nodes)) \
    --layers $layers \
    --grad-accum 10 \
    --gpus 4 \
    --backend ddp

I get the following error

Traceback (most recent call last):
  File "/home/jmorton/miniconda3/envs/alignment/bin/deepblast-train", line 7, in <module>
    exec(compile(f.read(), __file__, 'exec'))
  File "/home/jmorton/research/gert/deepblast/scripts/deepblast-train", line 67, in <module>
    main(hparams)
  File "/home/jmorton/research/gert/deepblast/scripts/deepblast-train", line 47, in main
    trainer.fit(model)
  File "/home/jmorton/miniconda3/envs/alignment/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 964, in fit
    self.set_random_port()
  File "/home/jmorton/miniconda3/envs/alignment/lib/python3.8/site-packages/pytorch_lightning/trainer/distrib_data_parallel.py", line 392, in set_random_port
    assert self.num_nodes == 1, 'random port can only be called from single node training'
AssertionError: random port can only be called from single node training

Its likely because this line of code just originated from a merge yesterday here: Lightning-AI/pytorch-lightning#2512 (comment)

Edge case in Dataset test

The TestTMAlignDataset.test_getitem test in test_dataset.py is revealing a weird edge case.

The input sequences are of length 103 and 21, but the returned alignment matrix is 103 x 22.
Where tf is the extra column coming from?

Differentiable Viterbi algorithm

We need to first make sure that the models are working

Viterbi implementation
Unittests

An initial implementation can be found below with unittests

This is relevant to #31

Summary stats are nan

A number of validation summary stats are nan for some reason.
The FDR rates and percent identities shouldn't be nan.

Computing match scores

From the NN model, what is the appropriate transformation to ensure that the alignment scoring parameters are scaled correctly?

Right now, we are just doing a dot product between the embedding layers.

Also, how do we score mismatches. For instance, TM-align will assign some pairs to be matched, outside 5A. What would be an appropriate architecture to handle this?

Computing gap scores

Right now, we have a very simple way to compute gaps using a linear layer on top of two concatenated sequences (see here.

There are two questions

Is there a smarter / more principled way to compute gaps?
Should we compute affine gaps? This can be done either with Gotoh's alignment algorithm or paired HMMs (but will require more cuda kernels).

Link paper in readme

First off, great work and great codebase! It would be nice if you could link your preprint from the readme. (If you're picky, a bibtex citation at the bottom can be helpful too.)

Alignment prediction needs to be appropriately trimmed

Make sure predictions are trimmed
Make sure gap scores at the boundaries are 0

Baseline models

We need to establish some baselines, for both debugging and benchmarking

One hot encoding language model

Mysterious bug in states2alignment

The states2alignment function in dataset.py randomly fails for well-defined inputs. This can sometimes be seen when running test_trainer.py

As a result, the call to alignment_text from score.py will randomly fail. The strange thing is that when the inputs used to curate unittests (see test_states2alignment_3 through test_states2alignment_7), the unittests will pass.

Loss functions

We still need to explore alternatives for loss functions. Right now, we are just using Frobieus norm

It'll be nice we could also have

Cross-entropy of ground-truth alignments
Lower diagonal Frobieus norm (page 11 of this paper)

See here for more options for loss functions

Port language models

Port the LSTM model from Bepler et al

Dataloader

Need to identify what a reasonable data format would be here.

We want to

Represent alignments (as alignment string and as binary matrix)
Allow for the dataloader to output unaligned sequences

GPU Memory errors

We're getting an interesting error when trying to run the GPU code on larger datasets

Here is the script that I am running

workers=30
nodes=1
RESULTS=results/small_run_w${workers}_n${nodes}
deepblast-train \
    --train-pairs $DIR/train.txt \
    --test-pairs $DIR/test.txt \
    --valid-pairs $DIR/valid.txt \
    --output-directory $RESULTS \
    --num-workers $workers \
    --learning-rate 1e-4 \
    --visualization-fraction 0.01 \
    --batch-size 24 \
    --grad-accum 16 \
    --gpus 1

And below is the error message

Warning: Error detected in torch::autograd::GraphRoot. Traceback of forward call that caused the error:
  File "/home/jmorton/miniconda3/envs/alignment/bin/deepblast-train", line 7, in <module>
    exec(compile(f.read(), __file__, 'exec'))
  File "/home/jmorton/research/gert/deepblast/scripts/deepblast-train", line 67, in <module>
    main(hparams)
  File "/home/jmorton/research/gert/deepblast/scripts/deepblast-train", line 47, in main
    trainer.fit(model)
  File "/home/jmorton/miniconda3/envs/alignment/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 976, in fit
    results = self.single_gpu_train(model)
  File "/home/jmorton/miniconda3/envs/alignment/lib/python3.8/site-packages/pytorch_lightning/trainer/distrib_parts.py", line 186, in single_gpu_train
    results = self.run_pretrain_routine(model)
  File "/home/jmorton/miniconda3/envs/alignment/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1180, in run_pretrain_routine
    self.train()
  File "/home/jmorton/miniconda3/envs/alignment/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py", line 370, in train
    self.run_training_epoch()
  File "/home/jmorton/miniconda3/envs/alignment/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py", line 452, in run_training_epoch
    batch_output = self.run_training_batch(batch, batch_idx)
  File "/home/jmorton/miniconda3/envs/alignment/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py", line 627, in run_training_batch
    opt_closure_result = self.optimizer_closure(
  File "/home/jmorton/miniconda3/envs/alignment/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py", line 775, in optimizer_closure
    training_step_output = self.training_forward(split_batch, batch_idx, opt_idx,
  File "/home/jmorton/miniconda3/envs/alignment/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py", line 946, in training_forward
    output = self.model.training_step(*args)
  File "/home/jmorton/research/gert/deepblast/deepblast/trainer.py", line 80, in training_step
    predA = self.aligner(x, y)
  File "/home/jmorton/miniconda3/envs/alignment/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/jmorton/research/gert/deepblast/deepblast/alignment.py", line 79, in forward
    aln = self.nw.decode(theta, A)
  File "/home/jmorton/research/gert/deepblast/deepblast/nw_cuda.py", line 304, in decode
    v_grad, _ = torch.autograd.grad(v, (theta, A), create_graph=True)
  File "/home/jmorton/miniconda3/envs/alignment/lib/python3.8/site-packages/torch/autograd/__init__.py", line 156, in grad
    return Variable._execution_engine.run_backward(
 (print_stack at /opt/conda/conda-bld/pytorch_1591914886554/work/torch/csrc/autograd/python_anomaly_mode.cpp:60)
Traceback (most recent call last):
  File "/home/jmorton/miniconda3/envs/alignment/bin/deepblast-train", line 7, in <module>
    exec(compile(f.read(), __file__, 'exec'))
  File "/home/jmorton/research/gert/deepblast/scripts/deepblast-train", line 67, in <module>
    main(hparams)
  File "/home/jmorton/research/gert/deepblast/scripts/deepblast-train", line 47, in main
    trainer.fit(model)
  File "/home/jmorton/miniconda3/envs/alignment/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 976, in fit
    results = self.single_gpu_train(model)
  File "/home/jmorton/miniconda3/envs/alignment/lib/python3.8/site-packages/pytorch_lightning/trainer/distrib_parts.py", line 186, in single_gpu_train
    results = self.run_pretrain_routine(model)
  File "/home/jmorton/miniconda3/envs/alignment/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1180, in run_pretrain_routine
    self.train()
  File "/home/jmorton/miniconda3/envs/alignment/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py", line 370, in train
    self.run_training_epoch()
  File "/home/jmorton/miniconda3/envs/alignment/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py", line 452, in run_training_epoch
    batch_output = self.run_training_batch(batch, batch_idx)
  File "/home/jmorton/miniconda3/envs/alignment/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py", line 627, in run_training_batch
    opt_closure_result = self.optimizer_closure(
  File "/home/jmorton/miniconda3/envs/alignment/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py", line 775, in optimizer_closure
    training_step_output = self.training_forward(split_batch, batch_idx, opt_idx,
  File "/home/jmorton/miniconda3/envs/alignment/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py", line 946, in training_forward
    output = self.model.training_step(*args)
  File "/home/jmorton/research/gert/deepblast/deepblast/trainer.py", line 80, in training_step
    predA = self.aligner(x, y)
  File "/home/jmorton/miniconda3/envs/alignment/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/jmorton/research/gert/deepblast/deepblast/alignment.py", line 79, in forward
    aln = self.nw.decode(theta, A)
  File "/home/jmorton/research/gert/deepblast/deepblast/nw_cuda.py", line 304, in decode
    v_grad, _ = torch.autograd.grad(v, (theta, A), create_graph=True)
  File "/home/jmorton/miniconda3/envs/alignment/lib/python3.8/site-packages/torch/autograd/__init__.py", line 156, in grad
    return Variable._execution_engine.run_backward(
RuntimeError: CUDA error: an illegal memory access was encountered (operator() at /opt/conda/conda-bld/pytorch_1591914886554/work/aten/src/ATen/native/cuda/CUDAScalar.cu:19)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x4e (0x2aaaf9925b5e in /home/jmorton/miniconda3/envs/alignment/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x2111a53 (0x2aaad4623a53 in /home/jmorton/miniconda3/envs/alignment/lib/python3.8/site-packages/torch/lib/libtorch_cuda.so)
frame #2: at::native::_local_scalar_dense_cuda(at::Tensor const&) + 0x27 (0x2aaad4625157 in /home/jmorton/miniconda3/envs/alignment/lib/python3.8/site-packages/torch/lib/libtorch_cuda.so)
frame #3: <unknown function> + 0xdd2280 (0x2aaad32e4280 in /home/jmorton/miniconda3/envs/alignment/lib/python3.8/site-packages/torch/lib/libtorch_cuda.so)
frame #4: <unknown function> + 0xe22b9d (0x2aaacdc1ab9d in /home/jmorton/miniconda3/envs/alignment/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #5: <unknown function> + 0x27f3c99 (0x2aaacf5ebc99 in /home/jmorton/miniconda3/envs/alignment/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #6: <unknown function> + 0xe22b9d (0x2aaacdc1ab9d in /home/jmorton/miniconda3/envs/alignment/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #7: at::native::item(at::Tensor const&) + 0xc9c (0x2aaacd9187bc in /home/jmorton/miniconda3/envs/alignment/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #8: <unknown function> + 0xe997e0 (0x2aaacdc917e0 in /home/jmorton/miniconda3/envs/alignment/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #9: <unknown function> + 0x283f42b (0x2aaacf63742b in /home/jmorton/miniconda3/envs/alignment/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #10: <unknown function> + 0xe22b9d (0x2aaacdc1ab9d in /home/jmorton/miniconda3/envs/alignment/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #11: torch::autograd::Engine::evaluate_function(std::shared_ptr<torch::autograd::GraphTask>&, torch::autograd::Node*, torch::autograd::InputBuffer&) + 0xc4c (0x2aaacf8dc64c in /home/jmorton/miniconda3/envs/alignment/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #12: torch::autograd::Engine::thread_main(std::shared_ptr<torch::autograd::GraphTask> const&, bool) + 0x3d2 (0x2aaacf8dded2 in /home/jmorton/miniconda3/envs/alignment/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #13: torch::autograd::Engine::thread_init(int) + 0x39 (0x2aaacf8d6549 in /home/jmorton/miniconda3/envs/alignment/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #14: torch::autograd::python::PythonEngine::thread_init(int) + 0x38 (0x2aaacc125b08 in /home/jmorton/miniconda3/envs/alignment/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #15: <unknown function> + 0xc8163 (0x2aaac9ccf163 in /home/jmorton/miniconda3/envs/alignment/lib/python3.8/site-packages/torch/lib/../../../.././libstdc++.so.6)
frame #16: <unknown function> + 0x7ea5 (0x2aaaaacd6ea5 in /lib64/libpthread.so.0)
frame #17: clone + 0x6d (0x2aaaaafe98dd in /lib64/libc.so.6)

Exception ignored in: <function tqdm.__del__ at 0x2aab07342ca0>
Traceback (most recent call last):
  File "/home/jmorton/miniconda3/envs/alignment/lib/python3.8/site-packages/tqdm/std.py", line 1086, in __del__
  File "/home/jmorton/miniconda3/envs/alignment/lib/python3.8/site-packages/tqdm/std.py", line 1293, in close
  File "/home/jmorton/miniconda3/envs/alignment/lib/python3.8/site-packages/tqdm/std.py", line 1471, in display
  File "/home/jmorton/miniconda3/envs/alignment/lib/python3.8/site-packages/tqdm/std.py", line 1089, in __repr__
  File "/home/jmorton/miniconda3/envs/alignment/lib/python3.8/site-packages/tqdm/std.py", line 1433, in format_dict
TypeError: cannot unpack non-iterable NoneType object

	V = new(N + 1, M + 1).zero_() # N x M
	Q = new(N + 2, M + 2, 3).zero_() # N x M x S
	Q[N + 1, M + 1] = 1
	for i in range(1, N + 1):
	for j in range(1, M + 1):

flatironinstitute / deepblast Goto Github PK

deepblast's Introduction

DeepBLAST

Installation

Downloading pretrained models and data

Getting started

Citation

deepblast's People

Contributors

Stargazers

Watchers

Forkers

deepblast's Issues

Ground truth

Prediction

Ground truth

Prediction

Ground truth

Prediction

Recommend Projects

Recommend Topics

Recommend Org