magics-lab / dnabert_2 Goto Github PK
View Code? Open in Web Editor NEW[ICLR 2024] DNABERT-2: Efficient Foundation Model and Benchmark for Multi-Species Genome
License: Apache License 2.0
[ICLR 2024] DNABERT-2: Efficient Foundation Model and Benchmark for Multi-Species Genome
License: Apache License 2.0
Hi,
Thanks a lot for uploading DNABERT2.
I have noticed a small mistake when using lora, the default lora_target_modules ["query", "value"] do not seem to be correct. Through trial and error ["q", "v"] worked for me.
Hi,
Thanks for making DNABERT2 available!
I want to prepare embeddings of potentially long sequences for downstream use. How would you recommend I do that?
A) Taking the sequence as-is and embedding all at once ( I guess it is technically possible with ALiBi ?)
B) Chunking the sequence into smaller pieces as done for pretraining, and then concatenating the embeddings (128 nucleotides or 128 BPE tokens? Not sure)
Would appreciate your help!
Dear Zhihan,
Thank you for the great contribution to the Genome Foundation Model filed. I am quite interested in the Functional variants identification with DNABERT-2 section. However, I am not so sure about the "predicted high-attention regions" is from pre-training model or fine-tuning model. if it is fine-tuning model, which sub-task it is from?
We applied DNABERT to identify functional variants using around 700 million short variants in dbSNP([Sherry, 2001]. Specifically, we selected only those variants that are located inside DNABERT predicted high-attention regions and repeated the predictions, using sequences with altered alleles.
Thanks.
Shicheng
I think I get the point now. It is from fine-tuning stage.
I am trying to run the finetuning example that you have on the github and I keep getting this error. I am able to successfully run the quickstart section of your github.
The provided data_path is /uufs/chpc.utah.edu/common/home/u1323098/sundar-group-space2/PHAGE/MODELS/GUE/prom/prom_300_tata
WARNING:root:Perform single sequence classification...
WARNING:root:Perform single sequence classification...
WARNING:root:Perform single sequence classification...
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
/uufs/chpc.utah.edu/common/home/u1323098/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/5fd206e1a13cee3ef4a608677312175eb6f8143d/bert_layers.py:125: UserWarning: Unable to import Triton; defaulting MosaicBERT attention implementation to pytorch (this will reduce throughput when using this model).
warnings.warn(
Some weights of the model checkpoint at zhihan1996/DNABERT-2-117M were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.LayerNorm.weight', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.bias', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at zhihan1996/DNABERT-2-117M and are newly initialized: ['classifier.bias', 'classifier.weight', 'bert.pooler.dense.bias', 'bert.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Using cuda_amp half precision backend
***** Running training *****
Num examples = 4,904
Num Epochs = 3
Instantaneous batch size per device = 32
Total train batch size (w. parallel, distributed & accumulation) = 32
Gradient Accumulation steps = 1
Total optimization steps = 462
Number of trainable parameters = 117,070,082
43%|████████████████████████████████████████████████████████████▎ | 199/462 [00:12<00:12, 21.47it/s]***** Running Evaluation *****
Num examples = 613
Batch size = 32
Traceback (most recent call last):███████████████████████████████████████████████████████████████████████████████████████████████████▊ | 18/20 [00:00<00:00, 80.37it/s]
File "train.py", line 286, in <module>
train()
File "train.py", line 268, in train
trainer.train()
File "/uufs/chpc.utah.edu/common/home/u1323098/software/pkg/miniconda3/envs/dna/lib/python3.8/site-packages/transformers/trainer.py", line 1662, in train
return inner_training_loop(
File "/uufs/chpc.utah.edu/common/home/u1323098/software/pkg/miniconda3/envs/dna/lib/python3.8/site-packages/transformers/trainer.py", line 2006, in _inner_training_loop
self._maybe_log_save_evaluate(tr_loss, model, trial, epoch, ignore_keys_for_eval)
File "/uufs/chpc.utah.edu/common/home/u1323098/software/pkg/miniconda3/envs/dna/lib/python3.8/site-packages/transformers/trainer.py", line 2287, in _maybe_log_save_evaluate
metrics = self.evaluate(ignore_keys=ignore_keys_for_eval)
File "/uufs/chpc.utah.edu/common/home/u1323098/software/pkg/miniconda3/envs/dna/lib/python3.8/site-packages/transformers/trainer.py", line 2993, in evaluate
output = eval_loop(
File "/uufs/chpc.utah.edu/common/home/u1323098/software/pkg/miniconda3/envs/dna/lib/python3.8/site-packages/transformers/trainer.py", line 3281, in evaluation_loop
metrics = self.compute_metrics(EvalPrediction(predictions=all_preds, label_ids=all_labels))
File "train.py", line 204, in compute_metrics
return calculate_metric_with_sklearn(logits, labels)
File "train.py", line 190, in calculate_metric_with_sklearn
predictions = np.argmax(logits, axis=-1)
File "<__array_function__ internals>", line 200, in argmax
File "/uufs/chpc.utah.edu/common/home/u1323098/software/pkg/miniconda3/envs/dna/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 1242, in argmax
return _wrapfunc(a, 'argmax', axis=axis, out=out, **kwds)
File "/uufs/chpc.utah.edu/common/home/u1323098/software/pkg/miniconda3/envs/dna/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 54, in _wrapfunc
return _wrapit(obj, method, *args, **kwds)
File "/uufs/chpc.utah.edu/common/home/u1323098/software/pkg/miniconda3/envs/dna/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 43, in _wrapit
result = getattr(asarray(obj), method)(*args, **kwds)
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 2 dimensions. The detected shape was (2, 613) + inhomogeneous part.
43%|████████████████████████████████████████████████████████████▌ | 200/462 [00:12<00:16, 15.91it/s]
Here are the installation instructions that are working on my computer system (university of utah CHPC)
salloc --account=soc-gpu-np --partition=soc-gpu-np --nodes=1 --gres=gpu:a100:1
conda activate dna
cd sundar-group-space2/PHAGE/MODELS/DNABERT_2/
python3 -m pip install -r requirements.txt
pip uninstall triton
previously I had changed the requirements.txt to read
einops
transformers==4.28.0
peft
omegaconf
torch
evaluate
accelerate
I also had to change the learning rate from ${lr} to 1e-4 in run_dnabert2.sh
Hi,
Any estimate when the pre-training code will be available?
Thanks
Hi @Zhihan1996 , thanks for providing the code for finetuning DNABERT2. But there is no mention of how to generate the dev.tsv, test.csv and train.csv from our own dataset and how to provide the label 1 and 0 to the sequences. can you please let me know how to do that?
Hi! Very interesting work!
Could you please provide some details about the datasets in GUE?
For example, Splicing dataset has labels 0,1,2 which should correspond to donor, acceptor and non-splicing site, but which is which?
There are 5 subfolders in tf and mouse folders, but what are the transcription factors corresponding to these 0 - 4 folders?
I am trying to extract hidden layer output from all the layers in the model. As per the documentation, the output_all_encoded_layers
: boolean which controls the content of the encoded_layers
output as described below. Default: True
.. However Line 586 (https://huggingface.co/zhihan1996/DNABERT-2-117M/blob/main/bert_layers.py#L586) has this set to False
, which I was expecting to be the case in contrast to what documentation says because only last layer was returned in the output. However, when I set it to True
the inference fails. The traceback is as follows:
RuntimeError Traceback (most recent call last)
Cell In[60], line 1
----> 1 output = model(**b, output_all_encoded_layers=True)
File /lustre/scratch124/casm/team113/users/pg20/venvs/huggingface/lib/python3.10/site-packages/torch/nn/modules/module.py:1518, in Module._wrapped_call_impl(self, *args, **kwargs)
1516 return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc]
1517 else:
-> 1518 return self._call_impl(*args, **kwargs)
File /lustre/scratch124/casm/team113/users/pg20/venvs/huggingface/lib/python3.10/site-packages/torch/nn/modules/module.py:1527, in Module._call_impl(self, *args, **kwargs)
1522 # If we don't have any hooks, we want to skip the rest of the logic in
1523 # this function, and just call forward.
1524 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1525 or _global_backward_pre_hooks or _global_backward_hooks
1526 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1527 return forward_call(*args, **kwargs)
1529 try:
1530 result = None
File /lustre/scratch124/casm/team113/users/pg20/data/supporting/huggingface_models/modules/transformers_modules/zhihan1996/DNABERT-2-117M/81ac6a98387cf94bc283553260f3fa6b88cef2fa/bert_layers.py:616, in BertModel.forward(self, input_ids, token_type_ids, attention_mask, position_ids, output_all_encoded_layers, masked_tokens_mask, **kwargs)
614 if masked_tokens_mask is None:
615 sequence_output = encoder_outputs[-1]
--> 616 pooled_output = self.pooler(
617 sequence_output) if self.pooler is not None else None
618 else:
619 # TD [2022-03-01]: the indexing here is very tricky.
620 attention_mask_bool = attention_mask.bool()
File /lustre/scratch124/casm/team113/users/pg20/venvs/huggingface/lib/python3.10/site-packages/torch/nn/modules/module.py:1518, in Module._wrapped_call_impl(self, *args, **kwargs)
1516 return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc]
1517 else:
-> 1518 return self._call_impl(*args, **kwargs)
File /lustre/scratch124/casm/team113/users/pg20/venvs/huggingface/lib/python3.10/site-packages/torch/nn/modules/module.py:1527, in Module._call_impl(self, *args, **kwargs)
1522 # If we don't have any hooks, we want to skip the rest of the logic in
1523 # this function, and just call forward.
1524 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1525 or _global_backward_pre_hooks or _global_backward_hooks
1526 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1527 return forward_call(*args, **kwargs)
1529 try:
1530 result = None
File /lustre/scratch124/casm/team113/users/pg20/data/supporting/huggingface_models/modules/transformers_modules/zhihan1996/DNABERT-2-117M/81ac6a98387cf94bc283553260f3fa6b88cef2fa/bert_layers.py:501, in BertPooler.forward(self, hidden_states, pool)
495 def forward(self,
496 hidden_states: torch.Tensor,
497 pool: Optional[bool] = True) -> torch.Tensor:
498 # We "pool" the model by simply taking the hidden state corresponding
499 # to the first token.
500 first_token_tensor = hidden_states[:, 0] if pool else hidden_states
--> 501 pooled_output = self.dense(first_token_tensor)
502 pooled_output = self.activation(pooled_output)
503 return pooled_output
File /lustre/scratch124/casm/team113/users/pg20/venvs/huggingface/lib/python3.10/site-packages/torch/nn/modules/module.py:1518, in Module._wrapped_call_impl(self, *args, **kwargs)
1516 return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc]
1517 else:
-> 1518 return self._call_impl(*args, **kwargs)
File /lustre/scratch124/casm/team113/users/pg20/venvs/huggingface/lib/python3.10/site-packages/torch/nn/modules/module.py:1527, in Module._call_impl(self, *args, **kwargs)
1522 # If we don't have any hooks, we want to skip the rest of the logic in
1523 # this function, and just call forward.
1524 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1525 or _global_backward_pre_hooks or _global_backward_hooks
1526 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1527 return forward_call(*args, **kwargs)
1529 try:
1530 result = None
File /lustre/scratch124/casm/team113/users/pg20/venvs/huggingface/lib/python3.10/site-packages/torch/nn/modules/linear.py:114, in Linear.forward(self, input)
113 def forward(self, input: Tensor) -> Tensor:
--> 114 return F.linear(input, self.weight, self.bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (1x5 and 768x768)
Steps to reproduce the error:
import torch
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("zhihan1996/DNABERT-2-117M", trust_remote_code=True)
model = AutoModel.from_pretrained("zhihan1996/DNABERT-2-117M", trust_remote_code=True)
b = tokenizer('ATCG', return_tensors='pt', return_attention_mask=True)
output = model(**b, output_all_encoded_layers=True)
P.S. I am not using triton since it was failing in another step.
We tried to test DNABERT-2 on AWS EC2 p2.xlarge instance with Ubuntu and CUDA 11.5 and gcc version 9 (and we tried also version 11).
Every attempt failed.
We set the environment exploiting the requirements.txt posted on github but it still no worked.
The trouble has come with the command: hidden_states = model(inputs)[0] # [1, sequence_length, 768]
>>> hidden_states = model(inputs)[0] # [1, sequence_length, 768]
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Traceback (most recent call last):
File "<string>", line 21, in _fwd_kernel
KeyError: ('2-.-0-.-0--d6252949da17ceb5f3a278a70250af13-3b85c7bef5f0a641282f3b73af50f599-14de7de5c4da5794c8ca14e7e41a122d-3498c340fd4b6ee7805fd54b882a04f5-e1f133f98d04093da2078dfc51c36b72-b26258bf01f839199e39d64851821f26-d7c06e3b46e708006c15224aac7a1378-f585402118c8a136948ce0a49cfe122c', (torch.float16, torch.float16, torch.float16, torch.float16, torch.float16, torch.float32, torch.float32, 'fp32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32'), ('matrix', False, 64, False, False, True, 128, 128), (True, True, True, True, True, True, True, (False,), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (False, False), (False, False), (False, False), (True, False), (True, False), (True, False), (False, False), (False, False), (False, False), (True, False), (True, False), (True, False), (True, False)))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/triton/compiler.py", line 937, in build_triton_ir
generator.visit(fn.parse())
File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/triton/compiler.py", line 855, in visit
return super().visit(node)
File "/root/miniconda3/envs/dna/lib/python3.8/ast.py", line 371, in visit
return visitor(node)
File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/triton/compiler.py", line 183, in visit_Module
ast.NodeVisitor.generic_visit(self, node)
File "/root/miniconda3/envs/dna/lib/python3.8/ast.py", line 379, in generic_visit
self.visit(item)
File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/triton/compiler.py", line 855, in visit
return super().visit(node)
File "/root/miniconda3/envs/dna/lib/python3.8/ast.py", line 371, in visit
return visitor(node)
File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/triton/compiler.py", line 252, in visit_FunctionDef
has_ret = self.visit_compound_statement(node.body)
File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/triton/compiler.py", line 177, in visit_compound_statement
self.last_ret_type = self.visit(stmt)
File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/triton/compiler.py", line 855, in visit
return super().visit(node)
File "/root/miniconda3/envs/dna/lib/python3.8/ast.py", line 371, in visit
return visitor(node)
File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/triton/compiler.py", line 678, in visit_For
self.visit_compound_statement(node.body)
File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/triton/compiler.py", line 177, in visit_compound_statement
self.last_ret_type = self.visit(stmt)
File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/triton/compiler.py", line 855, in visit
return super().visit(node)
File "/root/miniconda3/envs/dna/lib/python3.8/ast.py", line 371, in visit
return visitor(node)
File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/triton/compiler.py", line 319, in visit_AugAssign
self.visit(assign)
File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/triton/compiler.py", line 855, in visit
return super().visit(node)
File "/root/miniconda3/envs/dna/lib/python3.8/ast.py", line 371, in visit
return visitor(node)
File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/triton/compiler.py", line 301, in visit_Assign
values = self.visit(node.value)
File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/triton/compiler.py", line 855, in visit
return super().visit(node)
File "/root/miniconda3/envs/dna/lib/python3.8/ast.py", line 371, in visit
return visitor(node)
File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/triton/compiler.py", line 339, in visit_BinOp
rhs = self.visit(node.right)
File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/triton/compiler.py", line 855, in visit
return super().visit(node)
File "/root/miniconda3/envs/dna/lib/python3.8/ast.py", line 371, in visit
return visitor(node)
File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/triton/compiler.py", line 797, in visit_Call
return fn(*args, _builder=self.builder, **kws)
File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/triton/impl/base.py", line 22, in wrapper
return fn(*args, **kwargs)
TypeError: dot() got an unexpected keyword argument 'trans_b'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/81ac6a98387cf94bc283553260f3fa6b88cef2fa/bert_layers.py", line 608, in forward
encoder_outputs = self.encoder(
File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/81ac6a98387cf94bc283553260f3fa6b88cef2fa/bert_layers.py", line 446, in forward
hidden_states = layer_module(hidden_states,
File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/81ac6a98387cf94bc283553260f3fa6b88cef2fa/bert_layers.py", line 327, in forward
attention_output = self.attention(hidden_states, cu_seqlens, seqlen,
File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/81ac6a98387cf94bc283553260f3fa6b88cef2fa/bert_layers.py", line 240, in forward
self_output = self.self(input_tensor, cu_seqlens, max_s, indices,
File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/81ac6a98387cf94bc283553260f3fa6b88cef2fa/bert_layers.py", line 181, in forward
attention = flash_attn_qkvpacked_func(qkv, bias)
File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/torch/autograd/function.py", line 506, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/root/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/81ac6a98387cf94bc283553260f3fa6b88cef2fa/flash_attn_triton.py", line1021, in forward
o, lse, ctx.softmax_scale = _flash_attn_forward(
File "/root/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/81ac6a98387cf94bc283553260f3fa6b88cef2fa/flash_attn_triton.py", line826, in _flash_attn_forward
_fwd_kernel[grid]( # type: ignore
File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/triton/runtime/autotuner.py", line 90, in run
return self.fn.run(*args, num_warps=config.num_warps, num_stages=config.num_stages, **kwargs, **config.kwargs)
File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/triton/runtime/autotuner.py", line 199, in run
return self.fn.run(*args, **kwargs)
File "<string>", line 41, in _fwd_kernel
File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/triton/compiler.py", line 1621, in compile
next_module = compile(module)
File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/triton/compiler.py", line 1550, in <lambda>
lambda src: ast_to_ttir(src, signature, configs[0], constants)),
File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/triton/compiler.py", line 962, in ast_to_ttir
mod, _ = build_triton_ir(fn, signature, specialization, constants)
File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/triton/compiler.py", line 942, in build_triton_ir
raise CompilationError(fn.src, node) from e
triton.compiler.CompilationError: at 114:24:
def _fwd_kernel(
Q,
K,
V,
Bias,
Out,
Lse,
TMP, # NOTE: TMP is a scratchpad buffer to workaround a compiler bug
softmax_scale,
stride_qb,
stride_qh,
stride_qm,
stride_kb,
stride_kh,
stride_kn,
stride_vb,
stride_vh,
stride_vn,
stride_bb,
stride_bh,
stride_bm,
stride_ob,
stride_oh,
stride_om,
nheads,
seqlen_q,
seqlen_k,
seqlen_q_rounded,
headdim,
CACHE_KEY_SEQLEN_Q,
CACHE_KEY_SEQLEN_K,
BIAS_TYPE: tl.constexpr,
IS_CAUSAL: tl.constexpr,
BLOCK_HEADDIM: tl.constexpr,
EVEN_M: tl.constexpr,
EVEN_N: tl.constexpr,
EVEN_HEADDIM: tl.constexpr,
BLOCK_M: tl.constexpr,
BLOCK_N: tl.constexpr,
):
start_m = tl.program_id(0)
off_hb = tl.program_id(1)
off_b = off_hb // nheads
off_h = off_hb % nheads
# off_b = tl.program_id(1)
# off_h = tl.program_id(2)
# off_hb = off_b * nheads + off_h
# initialize offsets
offs_m = start_m * BLOCK_M + tl.arange(0, BLOCK_M)
offs_n = tl.arange(0, BLOCK_N)
offs_d = tl.arange(0, BLOCK_HEADDIM)
# Initialize pointers to Q, K, V
# Adding parenthesis around indexing might use int32 math instead of int64 math?
# https://github.com/openai/triton/issues/741
# I'm seeing a tiny bit of difference (5-7us)
q_ptrs = Q + off_b * stride_qb + off_h * stride_qh + (
offs_m[:, None] * stride_qm + offs_d[None, :])
k_ptrs = K + off_b * stride_kb + off_h * stride_kh + (
offs_n[:, None] * stride_kn + offs_d[None, :])
v_ptrs = V + off_b * stride_vb + off_h * stride_vh + (
offs_n[:, None] * stride_vn + offs_d[None, :])
if BIAS_TYPE == 'vector':
b_ptrs = Bias + off_b * stride_bb + off_h * stride_bh + offs_n
elif BIAS_TYPE == 'matrix':
b_ptrs = Bias + off_b * stride_bb + off_h * stride_bh + (
offs_m[:, None] * stride_bm + offs_n[None, :])
else:
raise ValueError("BIAS_TYPE must be one of {'vector', 'matrix'}")
# initialize pointer to m and l
t_ptrs = TMP + off_hb * seqlen_q_rounded + offs_m
lse_i = tl.zeros([BLOCK_M], dtype=tl.float32) - float('inf')
m_i = tl.zeros([BLOCK_M], dtype=tl.float32) - float('inf')
acc_o = tl.zeros([BLOCK_M, BLOCK_HEADDIM], dtype=tl.float32)
# load q: it will stay in SRAM throughout
# [2022-10-30] TD: Triton bug - in the case of EVEN_M=True and EVEN_N=False, if we just call
# tl.load(q_ptrs), we get the wrong output!
if EVEN_M & EVEN_N:
if EVEN_HEADDIM:
q = tl.load(q_ptrs)
else:
q = tl.load(q_ptrs, mask=offs_d[None, :] < headdim, other=0.0)
else:
if EVEN_HEADDIM:
q = tl.load(q_ptrs, mask=offs_m[:, None] < seqlen_q, other=0.0)
else:
q = tl.load(q_ptrs,
mask=(offs_m[:, None] < seqlen_q) &
(offs_d[None, :] < headdim),
other=0.0)
# loop over k, v and update accumulator
end_n = seqlen_k if not IS_CAUSAL else tl.minimum(
(start_m + 1) * BLOCK_M, seqlen_k)
for start_n in range(0, end_n, BLOCK_N):
start_n = tl.multiple_of(start_n, BLOCK_N)
# -- compute qk ----
if EVEN_N & EVEN_M: # If we just do "if EVEN_N", there seems to be some race condition
if EVEN_HEADDIM:
k = tl.load(k_ptrs + start_n * stride_kn)
else:
k = tl.load(k_ptrs + start_n * stride_kn,
mask=offs_d[None, :] < headdim,
other=0.0)
else:
if EVEN_HEADDIM:
k = tl.load(k_ptrs + start_n * stride_kn,
mask=(start_n + offs_n)[:, None] < seqlen_k,
other=0.0)
else:
k = tl.load(k_ptrs + start_n * stride_kn,
mask=((start_n + offs_n)[:, None] < seqlen_k) &
(offs_d[None, :] < headdim),
other=0.0)
qk = tl.zeros([BLOCK_M, BLOCK_N], dtype=tl.float32)
qk += tl.dot(q, k, trans_b=True)
^
>>>
Hello, I was just wondering if you know when you will make the pre-training code available on your github? and, is it very different or similar to the pretraining code you have provided for DNABERT2?
Thank you for any assistance.
LeAnn
Would the authors consider adding an implementation of _no_split_modules
to offer multi-GPU options in Lightning for those wanting to use the embeddings?
Part 1: Tokenization and Dataset Preparation
from transformers import AutoTokenizer, BertForSequenceClassification
from torch.utils.data import Dataset
tokenizer = AutoTokenizer.from_pretrained("zhihan1996/DNABERT-2-117M", trust_remote_code=True)
model = BertForSequenceClassification.from_pretrained("zhihan1996/DNABERT-2-117M", num_labels=8)
class DNADataset(Dataset):
def __init__(self, data, tokenizer):
self.data = data
self.tokenizer = tokenizer
def __len__(self):
return len(self.data)
def __getitem__(self, idx):
seq, label = self.data[idx]
inputs = self.tokenizer(seq, return_tensors='pt', padding='max_length', max_length=600, truncation=True)
return {
'input_ids': inputs["input_ids"].squeeze(),
'label': label
}
Part 2: Retrieving Model Configuration
from transformers import AutoConfig
config = AutoConfig.from_pretrained("zhihan1996/DNABERT-2-117M", trust_remote_code=True)
print(config.max_position_embeddings)
Firstly, thank you so much for this implementation - this is really useful!
Is there still an input length constraint in the pretrained model, though? I noticed that when I feed in a sequence that generates more than 512 tokens (which was the original maximum BERT input sequence length) the model fails to generalize. Is this expected behaviour? Error given below -
If yes, then would you have any recommendations for dealing with sequences that generate more than 512 tokens?
Cell In[73], line 1
----> 1 dnabert(input_ids, attention_mask, token_type_ids)
File ~/predictor/venv/lib/python3.10/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
1496 # If we don't have any hooks, we want to skip the rest of the logic in
1497 # this function, and just call forward.
1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = [], []
File ~/predictor/venv/lib/python3.10/site-packages/transformers/models/bert/modeling_bert.py:1015, in BertModel.forward(self, input_ids, attention_mask, token_type_ids, position_ids, head_mask, inputs_embeds, encoder_hidden_states, encoder_attention_mask, past_key_values, use_cache, output_attentions, output_hidden_states, return_dict)
1008 # Prepare head mask if needed
1009 # 1.0 in head_mask indicate we keep the head
1010 # attention_probs has shape bsz x n_heads x N x N
1011 # input head_mask has shape [num_heads] or [num_hidden_layers x num_heads]
1012 # and head_mask is converted to shape [num_hidden_layers x batch x num_heads x seq_length x seq_length]
1013 head_mask = self.get_head_mask(head_mask, self.config.num_hidden_layers)
-> 1015 embedding_output = self.embeddings(
1016 input_ids=input_ids,
1017 position_ids=position_ids,
1018 token_type_ids=token_type_ids,
1019 inputs_embeds=inputs_embeds,
1020 past_key_values_length=past_key_values_length,
1021 )
1022 encoder_outputs = self.encoder(
1023 embedding_output,
1024 attention_mask=extended_attention_mask,
(...)
1032 return_dict=return_dict,
1033 )
1034 sequence_output = encoder_outputs[0]
File ~/predictor/venv/lib/python3.10/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
1496 # If we don't have any hooks, we want to skip the rest of the logic in
1497 # this function, and just call forward.
1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = [], []
File ~/predictor/venv/lib/python3.10/site-packages/transformers/models/bert/modeling_bert.py:238, in BertEmbeddings.forward(self, input_ids, token_type_ids, position_ids, inputs_embeds, past_key_values_length)
236 if self.position_embedding_type == "absolute":
237 position_embeddings = self.position_embeddings(position_ids)
--> 238 embeddings += position_embeddings
239 embeddings = self.LayerNorm(embeddings)
240 embeddings = self.dropout(embeddings)
RuntimeError: The size of tensor a (514) must match the size of tensor b (512) at non-singleton dimension 1
Also, would you have any examples of finetuning on our own datasets/a regression dataset coming up soon?
Hi, I tried to get logits of a input sequence using DNABert, here is my code:
tokenizer = AutoTokenizer.from_pretrained("zhihan1996/DNABERT-2-117M", trust_remote_code=True)
model = AutoModel.from_pretrained("zhihan1996/DNABERT-2-117M", trust_remote_code=True)
model.eval().cuda()
seq = "TCCCACTATTTGTCGGCTAGCCAGATTGTTGTGGTCTGATTAAAGTT\
TCAATTTATACCTTACAATGATGTAAGGTACGTGTAAGAGAAATCGATGGGATA\
TTTTTTTACAACAAGGTATTCTTAAAGTAAGAGTTATACGCTATGTGGAAAAGAGGTGTTTAAG"
tokens_ids = tokenizer.batch_encode_plus([seq], return_tensors="pt")["input_ids"]
attention_mask = tokens_ids != tokenizer.pad_token_id
torch_outs = model(tokens_ids.cuda(),
attention_mask=attention_mask.cuda(),
encoder_attention_mask=attention_mask.cuda(),
output_hidden_states=True,
labels=tokens_ids.cuda())
The torch_outs returned is a tuple having 2 tensors: torch.Size([1, 39, 768]) and torch.Size([1, 768])
. I assumed the first one are embeddings for each token, and the second is a pooled embeddings of this sequence?
Is that possible for the DNABert to return logits of each token?
Hi, I tried following the README with
from transformers import AutoModel, AutoTokenizer
# Load the model and tokenizer
model = AutoModel.from_pretrained("zhihan1996/DNABERT-2-117M")
tokenizer = AutoTokenizer.from_pretrained("zhihan1996/DNABERT-2-117M")
This does not work correctly with ALiBi, as it doesn't use the model class defined at https://huggingface.co/zhihan1996/DNABERT-2-117M/blob/main/bert_layers.py
Rather, it uses the default Huggingface BERT class, which, if I understand it correctly, does not have the ALiBI utils such as rebuild_alibi_tensor
.
I encountered the error by trying to run a sequence longer than 512. Should the model be loaded another way to reproduce it correctly?
Hi Zhihan,
I am wondering what do you think about the downstream association study to identify phenotype associated variants/regions in Genome Foundation Model. Do you think current model is suitable for such task or not? if not, is there any comments on how to develop a good model for such task.
Thanks
Shicheng
ValueError Traceback (most recent call last)
in <cell line: 1>()
----> 1 model = AutoModel.from_pretrained("zhihan1996/DNABERT-2-117M", trust_remote_code=True)
1 frames
/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py in register(cls, config_class, model_class, exist_ok)
534 """
535 if hasattr(model_class, "config_class") and model_class.config_class != config_class:
--> 536 raise ValueError(
537 "The model class you are passing has a config_class
attribute that is not consistent with the "
538 f"config class you passed (model has {model_class.config_class} and you passed {config_class}. Fix "
ValueError: The model class you are passing has a config_class
attribute that is not consistent with the config class you passed (model has <class 'transformers.models.bert.configuration_bert.BertConfig'> and you passed <class 'transformers_modules.zhihan1996.DNABERT-2-117M.81ac6a98387cf94bc283553260f3fa6b88cef2fa.configuration_bert.BertConfig'>. Fix one of those so they match!
Can you please help me out
Hi, I tried to first time run the finetunning with train.py. I had an error at the following lines:
parser = transformers.HfArgumentParser((ModelArguments, DataArguments, TrainingArguments))
model_args, data_args, training_args = parser.parse_args_into_dataclasses()
Below is the error message I got:
ValueError: (Field(name=None,type=None,default='steps',default_factory=<dataclasses._MISSING_TYPE object at 0x0000016635086850>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=None),) is not a valid IntervalStrategy
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\ProgramData\MiniConda\envs\dnabert\lib\enum.py", line 670, in __new__
raise exc
File "C:\ProgramData\MiniConda\envs\dnabert\lib\enum.py", line 653, in __new__
result = cls._missing_(value)
File "C:\ProgramData\MiniConda\envs\dnabert\lib\site-packages\transformers\utils\generic.py", line 348, in _missing_
raise ValueError(
ValueError: (Field(name=None,type=None,default='steps',default_factory=<dataclasses._MISSING_TYPE object at 0x0000016635086850>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=None),) is not a valid IntervalStrategy, please select one of ['no', 'steps', 'epoch']
Any idea to fix that?
Thank you!
Hello,
I have come across the following error when trying to run this model on Bridges2 using 8 GPU. I have set up a fresh conda environment as detailed in the README.md.
StopIteration: Caught StopIteration in replica 0 on device 0.
Original Traceback (most recent call last):
File "/jet/home/ahabib/.conda/envs/dna/lib/python3.8/site-packages/torch/nn/parallel/parallel_apply.py", line 64, in _worker
output = module(*input, **kwargs)
File "/jet/home/ahabib/.conda/envs/dna/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/jet/home/ahabib/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/5fd206e1a13cee3ef4a608677312175eb6f8143d/bert_layers.py", line 862, in forward
outputs = self.bert(
File "/jet/home/ahabib/.conda/envs/dna/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/jet/home/ahabib/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/5fd206e1a13cee3ef4a608677312175eb6f8143d/bert_layers.py", line 608, in forward
encoder_outputs = self.encoder(
File "/jet/home/ahabib/.conda/envs/dna/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/jet/home/ahabib/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/5fd206e1a13cee3ef4a608677312175eb6f8143d/bert_layers.py", line 416, in forward
dtype=next(self.parameters()).dtype) # fp16 compatibility
StopIteration
0%| | 0/5721 [00:10<?, ?it/s]++ date '+%Y-%m-%d %T'
I have attached my full output and batch script below! Thank you!
Thanks for the implementation of DNABERT, I had a small issue, could you please fix it?
model = AutoModel.from_pretrained("zhihan1996/DNABERT-2-117M", trust_remote_code=True)
when I run this I am getting the following error. Any idea how to fix it??
ValueError: The model class you are passing has a config_class
attribute that is not consistent with the config class you passed (model has and you passed . Fix one of those so they match!
In the paper you state, "In order to facilitate further research on large-scale genome foundational models, we have collated and made available multi-species genome datasets for both pre-training of models (Sec. 4.1) and benchmarking (Sec. 4.2)."
but I cannot see where these datasets are, I have looked both on Huggingface and your github?
Have I overlooked them somewhere?
Hi Zhihan,
Hope you're doing well. I had a wonderful time reading your paper in regards to the improvements made to DNABERT and trying to set up a language model evaluation framework.
However, I question whether the labels for the splice site recognition benchmark truly represents how a user may approach this task. When going through the train/dev/test set, I've noticed that your labels only provide an annotation on whether the entire sequence is {no splice site, donor, acceptor}. Current state of the art tools, such as SpliceAI, provide even further information similar to an NER task by labeling every nucleotide within a given sequence. Therefore, has DNABERT2 been previously evaluated in evaluating Splice sites as an NER task?
KeyError Traceback (most recent call last)
File :21, in _fwd_kernel(Q, K, V, Bias, Out, Lse, TMP, softmax_scale, stride_qb, stride_qh, stride_qm, stride_kb, stride_kh, stride_kn, stride_vb, stride_vh, stride_vn, stride_bb, stride_bh, stride_bm, stride_ob, stride_oh, stride_om, nheads, seqlen_q, seqlen_k, seqlen_q_rounded, headdim, CACHE_KEY_SEQLEN_Q, CACHE_KEY_SEQLEN_K, BIAS_TYPE, IS_CAUSAL, BLOCK_HEADDIM, EVEN_M, EVEN_N, EVEN_HEADDIM, BLOCK_M, BLOCK_N, grid, num_warps, num_stages, extern_libs, stream, warmup)
KeyError: ('2-.-0-.-0--d6252949da17ceb5f3a278a70250af13-3b85c7bef5f0a641282f3b73af50f599-14de7de5c4da5794c8ca14e7e41a122d-3498c340fd4b6ee7805fd54b882a04f5-e1f133f98d04093da2078dfc51c36b72-b26258bf01f839199e39d64851821f26-d7c06e3b46e708006c15224aac7a1378-f585402118c8a136948ce0a49cfe122c', (torch.float16, torch.float16, torch.float16, torch.float16, torch.float16, torch.float32, torch.float32, 'fp32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32'), ('matrix', False, 64, False, False, True, 128, 128), (True, True, True, True, True, True, True, (False,), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (False, False), (False, False), (False, False), (True, False), (True, False), (True, False), (False, False), (False, False), (False, False), (True, False), (True, False), (False, False), (False, False)))
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call last)
File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/triton/compiler.py:937, in build_triton_ir(fn, signature, specialization, constants)
936 try:
--> 937 generator.visit(fn.parse())
938 except Exception as e:
File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/triton/compiler.py:855, in CodeGenerator.visit(self, node)
854 warnings.simplefilter("ignore", PendingDeprecationWarning) # python 3.8
--> 855 return super().visit(node)
File ~/anaconda3/envs/pytorch_python38/lib/python3.8/ast.py:371, in NodeVisitor.visit(self, node)
370 visitor = getattr(self, method, self.generic_visit)
--> 371 return visitor(node)
File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/triton/compiler.py:183, in CodeGenerator.visit_Module(self, node)
182 def visit_Module(self, node):
--> 183 ast.NodeVisitor.generic_visit(self, node)
File ~/anaconda3/envs/pytorch_python38/lib/python3.8/ast.py:379, in NodeVisitor.generic_visit(self, node)
378 if isinstance(item, AST):
--> 379 self.visit(item)
380 elif isinstance(value, AST):
File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/triton/compiler.py:855, in CodeGenerator.visit(self, node)
854 warnings.simplefilter("ignore", PendingDeprecationWarning) # python 3.8
--> 855 return super().visit(node)
File ~/anaconda3/envs/pytorch_python38/lib/python3.8/ast.py:371, in NodeVisitor.visit(self, node)
370 visitor = getattr(self, method, self.generic_visit)
--> 371 return visitor(node)
File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/triton/compiler.py:252, in CodeGenerator.visit_FunctionDef(self, node)
251 # visit function body
--> 252 has_ret = self.visit_compound_statement(node.body)
253 # finalize function
File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/triton/compiler.py:177, in CodeGenerator.visit_compound_statement(self, stmts)
176 for stmt in stmts:
--> 177 self.last_ret_type = self.visit(stmt)
178 if isinstance(stmt, ast.Return):
File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/triton/compiler.py:855, in CodeGenerator.visit(self, node)
854 warnings.simplefilter("ignore", PendingDeprecationWarning) # python 3.8
--> 855 return super().visit(node)
File ~/anaconda3/envs/pytorch_python38/lib/python3.8/ast.py:371, in NodeVisitor.visit(self, node)
370 visitor = getattr(self, method, self.generic_visit)
--> 371 return visitor(node)
File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/triton/compiler.py:678, in CodeGenerator.visit_For(self, node)
677 self.scf_stack.append(node)
--> 678 self.visit_compound_statement(node.body)
679 self.scf_stack.pop()
File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/triton/compiler.py:177, in CodeGenerator.visit_compound_statement(self, stmts)
176 for stmt in stmts:
--> 177 self.last_ret_type = self.visit(stmt)
178 if isinstance(stmt, ast.Return):
File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/triton/compiler.py:855, in CodeGenerator.visit(self, node)
854 warnings.simplefilter("ignore", PendingDeprecationWarning) # python 3.8
--> 855 return super().visit(node)
File ~/anaconda3/envs/pytorch_python38/lib/python3.8/ast.py:371, in NodeVisitor.visit(self, node)
370 visitor = getattr(self, method, self.generic_visit)
--> 371 return visitor(node)
File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/triton/compiler.py:319, in CodeGenerator.visit_AugAssign(self, node)
318 assign = ast.Assign(targets=[node.target], value=rhs)
--> 319 self.visit(assign)
320 return self.get_value(name)
File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/triton/compiler.py:855, in CodeGenerator.visit(self, node)
854 warnings.simplefilter("ignore", PendingDeprecationWarning) # python 3.8
--> 855 return super().visit(node)
File ~/anaconda3/envs/pytorch_python38/lib/python3.8/ast.py:371, in NodeVisitor.visit(self, node)
370 visitor = getattr(self, method, self.generic_visit)
--> 371 return visitor(node)
File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/triton/compiler.py:301, in CodeGenerator.visit_Assign(self, node)
300 names = _names[0]
--> 301 values = self.visit(node.value)
302 if not isinstance(names, tuple):
File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/triton/compiler.py:855, in CodeGenerator.visit(self, node)
854 warnings.simplefilter("ignore", PendingDeprecationWarning) # python 3.8
--> 855 return super().visit(node)
File ~/anaconda3/envs/pytorch_python38/lib/python3.8/ast.py:371, in NodeVisitor.visit(self, node)
370 visitor = getattr(self, method, self.generic_visit)
--> 371 return visitor(node)
File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/triton/compiler.py:339, in CodeGenerator.visit_BinOp(self, node)
338 lhs = self.visit(node.left)
--> 339 rhs = self.visit(node.right)
340 fn = {
341 ast.Add: 'add',
342 ast.Sub: 'sub',
(...)
352 ast.BitXor: 'xor',
353 }[type(node.op)]
File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/triton/compiler.py:855, in CodeGenerator.visit(self, node)
854 warnings.simplefilter("ignore", PendingDeprecationWarning) # python 3.8
--> 855 return super().visit(node)
File ~/anaconda3/envs/pytorch_python38/lib/python3.8/ast.py:371, in NodeVisitor.visit(self, node)
370 visitor = getattr(self, method, self.generic_visit)
--> 371 return visitor(node)
File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/triton/compiler.py:797, in CodeGenerator.visit_Call(self, node)
795 if (hasattr(fn, 'self') and self.is_triton_tensor(fn.self))
796 or impl.is_builtin(fn):
--> 797 return fn(*args, _builder=self.builder, **kws)
798 if fn in self.builtins.values():
File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/triton/impl/base.py:22, in builtin..wrapper(*args, **kwargs)
18 raise ValueError(
19 "Did you forget to add @triton.jit ? "
20 "(_builder
argument must be provided outside of JIT functions.)"
21 )
---> 22 return fn(*args, **kwargs)
TypeError: dot() got an unexpected keyword argument 'trans_b'
The above exception was the direct cause of the following exception:
CompilationError Traceback (most recent call last)
Cell In[15], line 1
----> 1 teacher_train(T_model, cfg, train_loader, test_loader)
Cell In[14], line 39, in teacher_train(model, config, train_loader, test_loader)
37 mask = mask.to(config.device)
38 labels = labels.to(config.device)
---> 39 outputs = model(ids, mask)
40 model.zero_grad()
41 loss = F.cross_entropy(outputs, labels)
File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
1496 # If we don't have any hooks, we want to skip the rest of the logic in
1497 # this function, and just call forward.
1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = [], []
Cell In[12], line 12, in BERT_Model.forward(self, context, mask)
11 def forward(self, context, mask):
---> 12 outputs = self.bert(context, attention_mask=mask)
13 pooled = outputs[1]
14 out = self.fc(pooled)
File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
1496 # If we don't have any hooks, we want to skip the rest of the logic in
1497 # this function, and just call forward.
1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = [], []
File ~/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/5fd206e1a13cee3ef4a608677312175eb6f8143d/bert_layers.py:608, in BertModel.forward(self, input_ids, token_type_ids, attention_mask, position_ids, output_all_encoded_layers, masked_tokens_mask, **kwargs)
605 first_col_mask[:, 0] = True
606 subset_mask = masked_tokens_mask | first_col_mask
--> 608 encoder_outputs = self.encoder(
609 embedding_output,
610 attention_mask,
611 output_all_encoded_layers=output_all_encoded_layers,
612 subset_mask=subset_mask)
614 if masked_tokens_mask is None:
615 sequence_output = encoder_outputs[-1]
File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
1496 # If we don't have any hooks, we want to skip the rest of the logic in
1497 # this function, and just call forward.
1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = [], []
File ~/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/5fd206e1a13cee3ef4a608677312175eb6f8143d/bert_layers.py:446, in BertEncoder.forward(self, hidden_states, attention_mask, output_all_encoded_layers, subset_mask)
444 if subset_mask is None:
445 for layer_module in self.layer:
--> 446 hidden_states = layer_module(hidden_states,
447 cu_seqlens,
448 seqlen,
449 None,
450 indices,
451 attn_mask=attention_mask,
452 bias=alibi_attn_mask)
453 if output_all_encoded_layers:
454 all_encoder_layers.append(hidden_states)
File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
1496 # If we don't have any hooks, we want to skip the rest of the logic in
1497 # this function, and just call forward.
1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = [], []
File ~/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/5fd206e1a13cee3ef4a608677312175eb6f8143d/bert_layers.py:327, in BertLayer.forward(self, hidden_states, cu_seqlens, seqlen, subset_idx, indices, attn_mask, bias)
305 def forward(
306 self,
307 hidden_states: torch.Tensor,
(...)
313 bias: Optional[torch.Tensor] = None,
314 ) -> torch.Tensor:
315 """Forward pass for a BERT layer, including both attention and MLP.
316
317 Args:
(...)
325 bias: None or (batch, heads, max_seqlen_in_batch, max_seqlen_in_batch)
326 """
--> 327 attention_output = self.attention(hidden_states, cu_seqlens, seqlen,
328 subset_idx, indices, attn_mask, bias)
329 layer_output = self.mlp(attention_output)
330 return layer_output
File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
1496 # If we don't have any hooks, we want to skip the rest of the logic in
1497 # this function, and just call forward.
1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = [], []
File ~/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/5fd206e1a13cee3ef4a608677312175eb6f8143d/bert_layers.py:240, in BertUnpadAttention.forward(self, input_tensor, cu_seqlens, max_s, subset_idx, indices, attn_mask, bias)
218 def forward(
219 self,
220 input_tensor: torch.Tensor,
(...)
226 bias: Optional[torch.Tensor] = None,
227 ) -> torch.Tensor:
228 """Forward pass for scaled self-attention without padding.
229
230 Arguments:
(...)
238 bias: None or (batch, heads, max_seqlen_in_batch, max_seqlen_in_batch)
239 """
--> 240 self_output = self.self(input_tensor, cu_seqlens, max_s, indices,
241 attn_mask, bias)
242 if subset_idx is not None:
243 return self.output(index_first_axis(self_output, subset_idx),
244 index_first_axis(input_tensor, subset_idx))
File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
1496 # If we don't have any hooks, we want to skip the rest of the logic in
1497 # this function, and just call forward.
1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = [], []
File ~/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/5fd206e1a13cee3ef4a608677312175eb6f8143d/bert_layers.py:181, in BertUnpadSelfAttention.forward(self, hidden_states, cu_seqlens, max_seqlen_in_batch, indices, attn_mask, bias)
179 bias_dtype = bias.dtype
180 bias = bias.to(torch.float16)
--> 181 attention = flash_attn_qkvpacked_func(qkv, bias)
182 attention = attention.to(orig_dtype)
183 bias = bias.to(bias_dtype)
File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/torch/autograd/function.py:506, in Function.apply(cls, *args, **kwargs)
503 if not torch._C._are_functorch_transforms_active():
504 # See NOTE: [functorch vjp and autograd interaction]
505 args = _functorch.utils.unwrap_dead_wrappers(args)
--> 506 return super().apply(*args, **kwargs) # type: ignore[misc]
508 if cls.setup_context == _SingleLevelFunction.setup_context:
509 raise RuntimeError(
510 'In order to use an autograd.Function with functorch transforms '
511 '(vmap, grad, jvp, jacrev, ...), it must override the setup_context '
512 'staticmethod. For more details, please see '
513 'https://pytorch.org/docs/master/notes/extending.func.html')
File ~/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/5fd206e1a13cee3ef4a608677312175eb6f8143d/flash_attn_triton.py:1021, in _FlashAttnQKVPackedFunc.forward(ctx, qkv, bias, causal, softmax_scale)
1019 if qkv.stride(-1) != 1:
1020 qkv = qkv.contiguous()
-> 1021 o, lse, ctx.softmax_scale = _flash_attn_forward(
1022 qkv[:, :, 0],
1023 qkv[:, :, 1],
1024 qkv[:, :, 2],
1025 bias=bias,
1026 causal=causal,
1027 softmax_scale=softmax_scale)
1028 ctx.save_for_backward(qkv, o, lse, bias)
1029 ctx.causal = causal
File ~/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/5fd206e1a13cee3ef4a608677312175eb6f8143d/flash_attn_triton.py:826, in _flash_attn_forward(q, k, v, bias, causal, softmax_scale)
823 # BLOCK = 128
824 # num_warps = 4 if d <= 64 else 8
825 grid = lambda META: (triton.cdiv(seqlen_q, META['BLOCK_M']), batch * nheads)
--> 826 _fwd_kernel[grid]( # type: ignore
827 q,
828 k,
829 v,
830 bias,
831 o,
832 lse,
833 tmp,
834 softmax_scale,
835 q.stride(0),
836 q.stride(2),
837 q.stride(1),
838 k.stride(0),
839 k.stride(2),
840 k.stride(1),
841 v.stride(0),
842 v.stride(2),
843 v.stride(1),
844 *bias_strides,
845 o.stride(0),
846 o.stride(2),
847 o.stride(1),
848 nheads,
849 seqlen_q,
850 seqlen_k,
851 seqlen_q_rounded,
852 d,
853 seqlen_q // 32,
854 seqlen_k // 32, # key for triton cache (limit number of compilations)
855 # Can't use kwargs here because triton autotune expects key to be args, not kwargs
856 # IS_CAUSAL=causal, BLOCK_HEADDIM=d,
857 bias_type,
858 causal,
859 BLOCK_HEADDIM,
860 # BLOCK_M=BLOCK, BLOCK_N=BLOCK,
861 # num_warps=num_warps,
862 # num_stages=1,
863 )
864 return o, lse, softmax_scale
File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/triton/runtime/autotuner.py:90, in Autotuner.run(self, *args, **kwargs)
88 if config.pre_hook is not None:
89 config.pre_hook(self.nargs)
---> 90 return self.fn.run(*args, num_warps=config.num_warps, num_stages=config.num_stages, **kwargs, **config.kwargs)
File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/triton/runtime/autotuner.py:199, in Heuristics.run(self, *args, **kwargs)
197 for v, heur in self.values.items():
198 kwargs[v] = heur({**dict(zip(self.arg_names, args)), **kwargs})
--> 199 return self.fn.run(*args, **kwargs)
File :41, in _fwd_kernel(Q, K, V, Bias, Out, Lse, TMP, softmax_scale, stride_qb, stride_qh, stride_qm, stride_kb, stride_kh, stride_kn, stride_vb, stride_vh, stride_vn, stride_bb, stride_bh, stride_bm, stride_ob, stride_oh, stride_om, nheads, seqlen_q, seqlen_k, seqlen_q_rounded, headdim, CACHE_KEY_SEQLEN_Q, CACHE_KEY_SEQLEN_K, BIAS_TYPE, IS_CAUSAL, BLOCK_HEADDIM, EVEN_M, EVEN_N, EVEN_HEADDIM, BLOCK_M, BLOCK_N, grid, num_warps, num_stages, extern_libs, stream, warmup)
File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/triton/compiler.py:1621, in compile(fn, **kwargs)
1619 next_module = parse(path)
1620 else:
-> 1621 next_module = compile(module)
1622 fn_cache_manager.put(next_module, f"{name}.{ir}")
1623 if os.path.exists(path):
File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/triton/compiler.py:1550, in compile..(src)
1545 extern_libs = kwargs.get("extern_libs", dict())
1546 # build compilation stages
1547 stages = {
1548 "ast": (lambda path: fn, None),
1549 "ttir": (lambda path: parse_mlir_module(path, context),
-> 1550 lambda src: ast_to_ttir(src, signature, configs[0], constants)),
1551 "ttgir": (lambda path: parse_mlir_module(path, context),
1552 lambda src: ttir_to_ttgir(src, num_warps, num_stages, capability)),
1553 "llir": (lambda path: Path(path).read_text(),
1554 lambda src: ttgir_to_llir(src, extern_libs, capability)),
1555 "ptx": (lambda path: Path(path).read_text(),
1556 lambda src: llir_to_ptx(src, capability)),
1557 "cubin": (lambda path: Path(path).read_bytes(),
1558 lambda src: ptx_to_cubin(src, capability))
1559 }
1560 # find out the signature of the function
1561 if isinstance(fn, triton.runtime.JITFunction):
File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/triton/compiler.py:962, in ast_to_ttir(fn, signature, specialization, constants)
961 def ast_to_ttir(fn, signature, specialization, constants):
--> 962 mod, _ = build_triton_ir(fn, signature, specialization, constants)
963 return optimize_triton_ir(mod)
File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/triton/compiler.py:942, in build_triton_ir(fn, signature, specialization, constants)
940 if node is None or isinstance(e, (NotImplementedError, CompilationError)):
941 raise e
--> 942 raise CompilationError(fn.src, node) from e
943 ret = generator.module
944 # module takes ownership of the context
Hi Zhihan,
Thank you very much for your hard work on this repo, much appreciated!
I have been trying to recreate the results of the DNABERT2 on the COVID variant prediction task from the GUE benchmark and I have noticed a discrepancy between the reported values in the paper for the train, validation, and test splits.
In the paper, it is reported that the breakdown for these splits is: 77669 / 7000 / 7000.
However, using the files provided here, I get a breakdown for the splits as: 73335 / 9168 / 9168.
I haven't checked any of the other benchmark datasets but this may be an issue with others too.
Is it possible for you to fix the datasets provided to the correct splits given in the paper? This will allow others to recreate and validate your results.
why not install triton by running pip install triton
?
it seems that installing triton from source will meet some errors which are difficult to be solved.
import torch
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("zhihan1996/DNABERT-2-117M", trust_remote_code=True)
model = AutoModel.from_pretrained("zhihan1996/DNABERT-2-117M", trust_remote_code=True)
【hello, When I ran the above code, an error was reported as follows. Can you help me see what the reason is】
TypeError Traceback (most recent call last)
Cell In[6], line 5
2 from transformers import AutoTokenizer, AutoModel
4 tokenizer = AutoTokenizer.from_pretrained("zhihan1996/DNABERT-2-117M", trust_remote_code=True)
----> 5 model = AutoModel.from_pretrained("zhihan1996/DNABERT-2-117M", trust_remote_code=True)
File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/transformers/models/auto/auto_factory.py:479, in _BaseAutoModelClass.from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
475 model_class = get_class_from_dynamic_module(
476 class_ref, pretrained_model_name_or_path, **hub_kwargs, **kwargs
477 )
478 _ = hub_kwargs.pop("code_revision", None)
--> 479 return model_class.from_pretrained(
480 pretrained_model_name_or_path, *model_args, config=config, **hub_kwargs, **kwargs
481 )
482 elif type(config) in cls._model_mapping.keys():
483 model_class = _get_model_class(config, cls._model_mapping)
File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/transformers/modeling_utils.py:2675, in PreTrainedModel.from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
2672 init_contexts.append(init_empty_weights())
2674 with ContextManagers(init_contexts):
-> 2675 model = cls(config, *model_args, **model_kwargs)
2677 # Check first if we are from_pt
2678 if use_keep_in_fp32_modules:
File ~/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/5fd206e1a13cee3ef4a608677312175eb6f8143d/bert_layers.py:570, in BertModel.init(self, config, add_pooling_layer)
568 super(BertModel, self).init(config)
569 self.embeddings = BertEmbeddings(config)
--> 570 self.encoder = BertEncoder(config)
571 self.pooler = BertPooler(config) if add_pooling_layer else None
572 self.post_init()
File ~/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/5fd206e1a13cee3ef4a608677312175eb6f8143d/bert_layers.py:345, in BertEncoder.init(self, config)
343 def init(self, config):
344 super().init()
--> 345 layer = BertLayer(config)
346 self.layer = nn.ModuleList(
347 [copy.deepcopy(layer) for _ in range(config.num_hidden_layers)])
349 self.num_attention_heads = config.num_attention_heads
File ~/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/5fd206e1a13cee3ef4a608677312175eb6f8143d/bert_layers.py:303, in BertLayer.init(self, config)
301 super(BertLayer, self).init()
302 self.attention = BertUnpadAttention(config)
--> 303 self.mlp = BertGatedLinearUnitMLP(config)
File ~/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/5fd206e1a13cee3ef4a608677312175eb6f8143d/bert_layers.py:270, in BertGatedLinearUnitMLP.init(self, config)
266 self.config = config
267 self.gated_layers = nn.Linear(config.hidden_size,
268 config.intermediate_size * 2,
269 bias=False)
--> 270 self.act = nn.GELU(approximate='none')
271 self.wo = nn.Linear(config.intermediate_size, config.hidden_size)
272 self.dropout = nn.Dropout(config.hidden_dropout_prob)
TypeError: init() got an unexpected keyword argument 'approximate'
Dear Zhihan,
Which license is applied for DNABERT2 , Apache 2.0 License or any other versions?
Thanks.
Shicheng
Hello, I was testing the finetune code but I got this error:
Traceback (most recent call last):██████████████████████████████████████████████████████████████████████| 30/30 [00:00<00:00, 51.26it/s]
File "bin/DNABERT_2/finetune/train.py", line 286, in
train()
File "bin/DNABERT_2/finetune/train.py", line 268, in train
trainer.train()
File "/miniconda/envs/dna/lib/python3.8/site-packages/transformers/trainer.py", line 1645, in train
return inner_training_loop(
File "/miniconda/envs/dna/lib/python3.8/site-packages/transformers/trainer.py", line 2020, in _inner_training_loop
self._maybe_log_save_evaluate(tr_loss, model, trial, epoch, ignore_keys_for_eval)
File "/miniconda/envs/dna/lib/python3.8/site-packages/transformers/trainer.py", line 2321, in _maybe_log_save_evaluate
metrics = self.evaluate(ignore_keys=ignore_keys_for_eval)
File "/miniconda/envs/dna/lib/python3.8/site-packages/transformers/trainer.py", line 3053, in evaluate
output = eval_loop(
File "/miniconda/envs/dna/lib/python3.8/site-packages/transformers/trainer.py", line 3353, in evaluation_loop
metrics = self.compute_metrics(EvalPrediction(predictions=all_preds, label_ids=all_labels))
File "bin/DNABERT_2/finetune/train.py", line 204, in compute_metrics
return calculate_metric_with_sklearn(logits, labels)
File "bin/DNABERT_2/finetune/train.py", line 190, in calculate_metric_with_sklearn
predictions = np.argmax(logits, axis=-1)
File "<array_function internals>", line 200, in argmax
File "/miniconda/envs/dna/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 1242, in argmax
return _wrapfunc(a, 'argmax', axis=axis, out=out, **kwds)
File "/miniconda/envs/dna/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 54, in _wrapfunc
return _wrapit(obj, method, *args, **kwds)
File "/miniconda/envs/dna/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 43, in _wrapit
result = getattr(asarray(obj), method)(*args, **kwds)
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 2 dimensions. The detected shape was (2, 471) + inhomogeneous part.
Issue:
I am trying to load a pretrained DNABERT model from the model hub using the AutoModel
and AutoTokenizer
classes from the transformers
library. However, I am encountering an error related to inconsistent config_class
attributes.
Code to Reproduce:
from transformers import AutoModel, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("zhihan1996/DNABERT-2-117M", trust_remote_code=True)
model = AutoModel.from_pretrained("zhihan1996/DNABERT-2-117M", trust_remote_code=True)
Error Message:
ValueError: The model class you are passing has a `config_class` attribute that is not consistent with the config class you passed (model has <class 'transformers.models.bert.configuration_bert.BertConfig'> and you passed <class 'transformers_modules.zhihan1996.DNABERT-2-117M.81ac6a98387cf94bc283553260f3fa6b88cef2fa.configuration_bert.BertConfig'>. Fix one of those so they match!
Environment Details:
Dear Zhihan,
I am curious is there any evaluation to identify the best ratio for human vs non-human data? GENA-LM model included human T2T and 1000 Genome project to make human vs non-human equal to 1:1.
I am wondering how you think about this ratio when you develop this project/model.
Thanks.
Shicheng
Hi,
When I run the code from the README.md:
# Load model directly
from transformers import AutoTokenizer, AutoModelForMaskedLM
tokenizer = AutoTokenizer.from_pretrained("zhihan1996/DNABERT-2-117M", trust_remote_code=True)
model = AutoModelForMaskedLM.from_pretrained("zhihan1996/DNABERT-2-117M", trust_remote_code=True)`
I recieve the error shown at the bottom of this post. I have created a fresh conda environment and installed the packages listed in the README.md.
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
[/Users/colm/repos/dsb-gene-function-prediction/notebooks/exploratory/generate_embeddings_for_dna_sequences.ipynb](https://file+.vscode-resource.vscode-cdn.net/Users/colm/repos/dsb-gene-function-prediction/notebooks/exploratory/generate_embeddings_for_dna_sequences.ipynb) Cell 1 in 5
[2](vscode-notebook-cell:/Users/colm/repos/dsb-gene-function-prediction/notebooks/exploratory/generate_embeddings_for_dna_sequences.ipynb#W0sZmlsZQ%3D%3D?line=1) from transformers import AutoTokenizer, AutoModelForMaskedLM
[4](vscode-notebook-cell:/Users/colm/repos/dsb-gene-function-prediction/notebooks/exploratory/generate_embeddings_for_dna_sequences.ipynb#W0sZmlsZQ%3D%3D?line=3) tokenizer = AutoTokenizer.from_pretrained("zhihan1996/DNABERT-2-117M", trust_remote_code=True, revision="5fd206e")
----> [5](vscode-notebook-cell:/Users/colm/repos/dsb-gene-function-prediction/notebooks/exploratory/generate_embeddings_for_dna_sequences.ipynb#W0sZmlsZQ%3D%3D?line=4) model = AutoModelForMaskedLM.from_pretrained("zhihan1996/DNABERT-2-117M", trust_remote_code=True, revision="5fd206e")
File [~/anaconda3/envs/DSBPredict/lib/python3.8/site-packages/transformers/models/auto/auto_factory.py:430](https://file+.vscode-resource.vscode-cdn.net/Users/colm/repos/dsb-gene-function-prediction/notebooks/exploratory/~/anaconda3/envs/DSBPredict/lib/python3.8/site-packages/transformers/models/auto/auto_factory.py:430), in _BaseAutoModelClass.from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
428 class_ref = config.auto_map[cls.__name__]
429 module_file, class_name = class_ref.split(".")
--> 430 model_class = get_class_from_dynamic_module(
431 pretrained_model_name_or_path, module_file + ".py", class_name, **kwargs
432 )
433 return model_class.from_pretrained(pretrained_model_name_or_path, *model_args, config=config, **kwargs)
434 elif type(config) in cls._model_mapping.keys():
File [~/anaconda3/envs/DSBPredict/lib/python3.8/site-packages/transformers/models/auto/dynamic.py:231](https://file+.vscode-resource.vscode-cdn.net/Users/colm/repos/dsb-gene-function-prediction/notebooks/exploratory/~/anaconda3/envs/DSBPredict/lib/python3.8/site-packages/transformers/models/auto/dynamic.py:231), in get_class_from_dynamic_module(pretrained_model_name_or_path, module_file, class_name, cache_dir, force_download, resume_download, proxies, use_auth_token, revision, local_files_only, **kwargs)
229 # And lastly we get the class inside our newly created module
230 final_module = os.path.join(full_submodule, module_name.replace(".py", ""))
--> 231 return get_class_in_module(class_name, final_module)
File [~/anaconda3/envs/DSBPredict/lib/python3.8/site-packages/transformers/models/auto/dynamic.py:103](https://file+.vscode-resource.vscode-cdn.net/Users/colm/repos/dsb-gene-function-prediction/notebooks/exploratory/~/anaconda3/envs/DSBPredict/lib/python3.8/site-packages/transformers/models/auto/dynamic.py:103), in get_class_in_module(class_name, module_path)
99 """
100 Import a module on the cache directory for modules and extract a class from it.
101 """
...
25 unpad_input, unpad_input_only)
27 try:
28 from .flash_attn_triton import flash_attn_qkvpacked_func
ModuleNotFoundError: No module named 'transformers_modules.zhihan1996.DNABERT-2-117M.bert_padding'
Thank you for your exceptional work!
Your model outperforms others with its efficiency in terms of parameters and speed. We have noticed that flash-attention has recently released version 2, which has greatly improved computation speed. We kindly request the incorporation of this update as it is urgently needed.
Once again, thank you for your hard work! :)
Hi,
Thank you so much for DNABERT-2! There were some issues when I ran the model and used the results.
First, when I run the sample code and find hidden_ States. shape is torch. Size ([1, 17, 768]) not torch. Size ([1, len (DNA), 768])
import torch
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("zhihan1996/DNABERT-2-117M", trust_remote_code=True)
model = AutoModel.from_pretrained("zhihan1996/DNABERT-2-117M", trust_remote_code=True)
dna = "ACGTAGCATCGGATCTATCTATCGACACTTGGTTATCGATCTACGAGCATCTCGTTAGC"
inputs = tokenizer(dna, return_tensors = 'pt')["input_ids"]
hidden_states = model(inputs)[0] # [1, sequence_length, 768]
# Question 1 : torch.Size([1, 17, 768]) not torch.Size([1, len(dna), 768])
print(hidden_states.shape)
Secondly, I have observed that the model (inputs) returns two tensors with sizes of torch. Size ([1, 17, 768]) and torch. Size ([1, 768]). I would like to ask what the second tensor means?
import torch
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("zhihan1996/DNABERT-2-117M", trust_remote_code=True)
model = AutoModel.from_pretrained("zhihan1996/DNABERT-2-117M", trust_remote_code=True)
dna = "ACGTAGCATCGGATCTATCTATCGACACTTGGTTATCGATCTACGAGCATCTCGTTAGC"
inputs = tokenizer(dna, return_tensors='pt')["input_ids"]
result = model(inputs)
# output: 2
print(len(result))
# Question 2: What is the meaning of result[1]. shape
# output: torch.Size([1, 17, 768]) torch.Size([1, 768])
print(result[0].shape, result[1].shape)
Third, if multiple DNA sequences are inputted simultaneously and padding is set, how can I obtain the tensor encoding of the corresponding length for each sequence (instead of the length after padding)
import torch
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("zhihan1996/DNABERT-2-117M", trust_remote_code=True)
model = AutoModel.from_pretrained("zhihan1996/DNABERT-2-117M", trust_remote_code=True)
dna = ["ACGTAGCATCGGATCTATTTAGC", "ACGTAGCATCGGATCTATCTATCGACACCTATCATCTCGTTAGC", "ACGTATTATCGATCTACGAGCATCTCGTTAGC"]
inputs = tokenizer(dna, return_tensors='pt', padding=True)["input_ids"]
hidden_states = model(inputs)[0] # [1, sequence_length, 768]
# torch.Size([3, 13, 768])
print(hidden_states.shape)
For the third question, they did this when using ESM before. I want to obtain the same sequence_ representations.
import torch
# Load ESM-2 model
model, alphabet = torch.hub.load("facebookresearch/esm:main", "esm2_t33_650M_UR50D")
batch_converter = alphabet.get_batch_converter()
model.eval() # disables dropout for deterministic results
# Prepare data (first 2 sequences from ESMStructuralSplitDataset superfamily / 4)
data = [
("protein1", "MKTVRQERLKSIVRILERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGYNIVATPRGYVLAGG"),
("protein2", "KALTARQQEVFDLIRDHISQTGMPPTRAEIAQRLGFRSPNAAEEHLKALARKGVIEIVSGASRGIRLLQEE"),
("protein2 with mask", "KALTARQQEVFDLIRD<mask>ISQTGMPPTRAEIAQRLGFRSPNAAEEHLKALARKGVIEIVSGASRGIRLLQEE"),
("protein3", "K A <mask> I S Q"),
]
batch_labels, batch_strs, batch_tokens = batch_converter(data)
batch_lens = (batch_tokens != alphabet.padding_idx).sum(1)
# Extract per-residue representations (on CPU)
with torch.no_grad():
results = model(batch_tokens, repr_layers=[33], return_contacts=True)
token_representations = results["representations"][33]
# Output: torch.Size([4, 73, 1280])
print(token_representations.shape)
# Generate per-sequence representations via averaging
# NOTE: token 0 is always a beginning-of-sequence token, so the first residue is token 1.
sequence_representations = []
for i, tokens_len in enumerate(batch_lens):
sequence_representation = token_representations[i, 1: tokens_len - 1].mean(0)
# Output: torch.Size([1280])
print(sequence_representation.shape)
sequence_representations.append(sequence_representation)
Thanks !!!
I tried to run the finetune, I put the sequences and labels to the csv files, just followed the format in the sample data. But when I ran this, error happened, I do not know why. Could anyone help me, thank you?
`
trainer.train()
File "/home/u7343217/.conda/envs/canberra_city/lib/python3.10/site-packages/transformers/trainer.py", line 1553, in train return inner_training_loop(
File "/home/u7343217/.conda/envs/canberra_city/lib/python3.10/site-packages/transformers/trainer.py", line 1835, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/home/u7343217/.conda/envs/canberra_city/lib/python3.10/site-packages/transformers/trainer.py", line 2679, in training_step
loss = self.compute_loss(model, inputs)
File "/home/u7343217/.conda/envs/canberra_city/lib/python3.10/site-packages/transformers/trainer.py", line 2704, in compute_loss
outputs = model(**inputs)
File "/home/u7343217/.conda/envs/canberra_city/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/u7343217/.conda/envs/canberra_city/lib/python3.10/site-packages/torch/nn/parallel/data_parallel.py", line 171, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/home/u7343217/.conda/envs/canberra_city/lib/python3.10/site-packages/torch/nn/parallel/data_parallel.py", line 181, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/home/u7343217/.conda/envs/canberra_city/lib/python3.10/site-packages/torch/nn/parallel/parallel_apply.py", line 89, in parallel_apply
output.reraise()
File "/home/u7343217/.conda/envs/canberra_city/lib/python3.10/site-packages/torch/_utils.py", line 644, in reraise
raise exception
RuntimeError: Caught RuntimeError in replica 0 on device 0.
Original Traceback (most recent call last):
File "/home/u7343217/.conda/envs/canberra_city/lib/python3.10/site-packages/torch/nn/parallel/parallel_apply.py", line 64, in _worker
output = module(*input, **kwargs)
File "/home/u7343217/.conda/envs/canberra_city/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/u7343217/.conda/envs/canberra_city/lib/python3.10/site-packages/transformers/models/bert/modeling_bert.py", line 1562, in forward
outputs = self.bert(
File "/home/u7343217/.conda/envs/canberra_city/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/u7343217/.conda/envs/canberra_city/lib/python3.10/site-packages/transformers/models/bert/modeling_bert.py", line 988, in forward
buffered_token_type_ids_expanded = buffered_token_type_ids.expand(batch_size, seq_length)
RuntimeError: The expanded size of the tensor (673) must match the existing size (512) at non-singleton dimension 1. Target sizes: [8, 673]. Tensor sizes: [1, 512] `
It seems that the dimension of the tensor do not match, but I did not change any code, The command I ran was, python train.py --model_name_or_path zhihan1996/DNABERT-2-117M --data_path splited/csv_files --kmer -1 --run_name DNABERT2_test1 --model_max_length 700 --per_device_train_batch_size 8 --per_device_eval_batch_size 16 --gradient_accumulation_steps 1 --learning_rate 3e-5 --num_train_epochs 3 --fp16 --save_steps 200 --output_dir output/dnabert2 --evaluation_strategy steps --eval_steps 200 --warmup_steps 50 --logging_steps 100000 --overwrite_output_dir True --log_level info --find_unused_parameters False
. I set the max length hyperparameter as 700 because my sequences are long.
Hi @Zhihan1996, I must say it is an excellent work.
I was going through the GUE and their dataset. I have covid data, I want to finetune it. Can you please tell me how can I do it? Because DNABERT-2 has covid data in it, so it would be really helpful for me. Thanks!
I have used the code for computing embeddings. But it is showing the error.
dna = "ACGTAGCATCGGATCTATCTATCGACACTTGGTTATCGATCTACGAGCATCTCGTTAGC"
inputs = tokenizer(dna, return_tensors = 'pt')["input_ids"]
hidden_states = model(inputs)[0] # [1, sequence_length, 768]
embedding_mean = torch.mean(hidden_states[0], dim=0)
print(embedding_mean.shape) # expect to be 768
embedding_max = torch.max(hidden_states[0], dim=0)[0]
print(embedding_max.shape) # expect to be 768
AssertionError Traceback (most recent call last)
in <cell line: 3>()
1 dna = "ACGTAGCATCGGATCTATCTATCGACACTTGGTTATCGATCTACGAGCATCTCGTTAGC"
2 inputs = tokenizer(dna, return_tensors = 'pt')["input_ids"]
----> 3 hidden_states = model(inputs)[0] # [1, sequence_length, 768]
4
5 # embedding with mean pooling
12 frames
~/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/81ac6a98387cf94bc283553260f3fa6b88cef2fa/flash_attn_triton.py in _flash_attn_forward(q, k, v, bias, causal, softmax_scale)
779 assert q.dtype in [torch.float16,
780 torch.bfloat16], 'Only support fp16 and bf16'
--> 781 assert q.is_cuda and k.is_cuda and v.is_cuda
782 softmax_scale = softmax_scale or 1.0 / math.sqrt(d)
783
AssertionError:
Please let me know what should I do?
Hi,
As instructed, when I run:
import torch
from transformers import AutoModel
model = AutoModel.from_pretrained("zhihan1996/DNABERT-2-117M", trust_remote_code=True)
I got this error:
ValueError: The model class you are passing has a `config_class` attribute that is not consistent with the config class you passed (model has <class 'transformers.models.bert.configuration_bert.BertConfig'> and you passed <class 'transformers_modules.zhihan1996.DNABERT-2-117M.81ac6a98387cf94bc283553260f3fa6b88cef2fa.configuration_bert.BertConfig'>. Fix one of those so they match!
And, when I set trust_remote_code=False
, I got these warnings:
Some weights of BertModel were not initialized from the model checkpoint at zhihan1996/DNABERT-2-117M and are newly initialized: ['bert.encoder.layer.9.attention.self.value.bias', 'bert.encoder.layer.9.attention.self.query.bias', 'bert.encoder.layer.1.attention.self.key.weight', 'bert.encoder.layer.3.intermediate.dense.weight', 'bert.encoder.layer.4.output.dense.bias', 'bert.encoder.layer.6.attention.self.value.bias', 'bert.encoder.layer.2.attention.self.value.bias', 'bert.encoder.layer.6.output.LayerNorm.weight', 'bert.encoder.layer.10.attention.self.value.weight', 'bert.encoder.layer.5.output.dense.weight', 'bert.encoder.layer.3.attention.self.key.weight', 'bert.encoder.layer.3.attention.self.query.weight', 'bert.encoder.layer.10.output.dense.weight', 'bert.encoder.layer.10.attention.self.value.bias', 'bert.embeddings.position_embeddings.weight', 'bert.encoder.layer.8.output.dense.bias', 'bert.encoder.layer.3.attention.self.key.bias', 'bert.encoder.layer.1.attention.self.query.weight', 'bert.encoder.layer.6.attention.self.query.weight', 'bert.encoder.layer.8.attention.self.value.bias', 'bert.encoder.layer.8.attention.self.query.bias', 'bert.encoder.layer.3.attention.self.query.bias', 'bert.encoder.layer.5.attention.self.query.weight', 'bert.encoder.layer.8.attention.self.query.weight', 'bert.encoder.layer.5.attention.self.key.weight', 'bert.encoder.layer.10.intermediate.dense.bias', 'bert.encoder.layer.5.output.dense.bias', 'bert.encoder.layer.10.output.LayerNorm.weight', 'bert.encoder.layer.7.output.dense.bias', 'bert.encoder.layer.0.attention.self.key.bias', 'bert.encoder.layer.1.attention.self.value.bias', 'bert.encoder.layer.9.output.LayerNorm.weight', 'bert.encoder.layer.4.intermediate.dense.bias', 'bert.encoder.layer.4.attention.self.query.weight', 'bert.encoder.layer.6.intermediate.dense.bias', 'bert.encoder.layer.8.attention.self.value.weight', 'bert.encoder.layer.1.attention.self.query.bias', 'bert.encoder.layer.10.output.dense.bias', 'bert.encoder.layer.6.output.LayerNorm.bias', 'bert.encoder.layer.11.output.LayerNorm.bias', 'bert.encoder.layer.5.intermediate.dense.bias', 'bert.pooler.dense.bias', 'bert.encoder.layer.7.attention.self.key.weight', 'bert.encoder.layer.4.attention.self.value.weight', 'bert.encoder.layer.5.output.LayerNorm.bias', 'bert.encoder.layer.4.output.LayerNorm.bias', 'bert.encoder.layer.8.attention.self.key.weight', 'bert.encoder.layer.9.attention.self.key.bias', 'bert.encoder.layer.4.output.LayerNorm.weight', 'bert.encoder.layer.0.intermediate.dense.weight', 'bert.encoder.layer.8.output.dense.weight', 'bert.encoder.layer.3.attention.self.value.bias', 'bert.encoder.layer.6.intermediate.dense.weight', 'bert.encoder.layer.2.intermediate.dense.weight', 'bert.encoder.layer.8.output.LayerNorm.weight', 'bert.encoder.layer.11.intermediate.dense.bias', 'bert.encoder.layer.3.output.dense.bias', 'bert.encoder.layer.0.attention.self.value.weight', 'bert.encoder.layer.1.intermediate.dense.weight', 'bert.encoder.layer.9.attention.self.key.weight', 'bert.encoder.layer.7.output.dense.weight', 'bert.encoder.layer.1.output.LayerNorm.bias', 'bert.encoder.layer.2.attention.self.key.weight', 'bert.encoder.layer.0.output.dense.weight', 'bert.encoder.layer.7.output.LayerNorm.weight', 'bert.encoder.layer.11.attention.self.value.bias', 'bert.encoder.layer.7.attention.self.query.weight', 'bert.encoder.layer.7.intermediate.dense.bias', 'bert.encoder.layer.11.output.dense.weight', 'bert.encoder.layer.4.attention.self.key.bias', 'bert.encoder.layer.6.output.dense.bias', 'bert.encoder.layer.7.intermediate.dense.weight', 'bert.encoder.layer.6.attention.self.key.weight', 'bert.encoder.layer.10.attention.self.query.weight', 'bert.encoder.layer.4.attention.self.key.weight', 'bert.encoder.layer.11.attention.self.key.weight', 'bert.encoder.layer.9.output.dense.bias', 'bert.encoder.layer.11.attention.self.key.bias', 'bert.encoder.layer.11.output.LayerNorm.weight', 'bert.encoder.layer.5.attention.self.value.weight', 'bert.encoder.layer.1.attention.self.key.bias', 'bert.encoder.layer.0.attention.self.key.weight', 'bert.encoder.layer.2.output.dense.weight', 'bert.encoder.layer.0.output.LayerNorm.weight', 'bert.encoder.layer.9.output.dense.weight', 'bert.encoder.layer.4.output.dense.weight', 'bert.encoder.layer.5.attention.self.query.bias', 'bert.encoder.layer.9.output.LayerNorm.bias', 'bert.encoder.layer.1.intermediate.dense.bias', 'bert.encoder.layer.6.attention.self.key.bias', 'bert.encoder.layer.0.attention.self.query.weight', 'bert.encoder.layer.11.intermediate.dense.weight', 'bert.encoder.layer.0.output.LayerNorm.bias', 'bert.encoder.layer.9.attention.self.value.weight', 'bert.encoder.layer.3.attention.self.value.weight', 'bert.encoder.layer.8.output.LayerNorm.bias', 'bert.encoder.layer.9.intermediate.dense.bias', 'bert.encoder.layer.2.attention.self.key.bias', 'bert.encoder.layer.2.attention.self.value.weight', 'bert.encoder.layer.2.attention.self.query.weight', 'bert.encoder.layer.3.output.LayerNorm.weight', 'bert.encoder.layer.5.attention.self.value.bias', 'bert.encoder.layer.0.output.dense.bias', 'bert.encoder.layer.10.attention.self.query.bias', 'bert.encoder.layer.1.output.dense.weight', 'bert.encoder.layer.7.attention.self.value.bias', 'bert.encoder.layer.2.attention.self.query.bias', 'bert.encoder.layer.0.attention.self.value.bias', 'bert.encoder.layer.10.output.LayerNorm.bias', 'bert.encoder.layer.7.output.LayerNorm.bias', 'bert.encoder.layer.9.intermediate.dense.weight', 'bert.encoder.layer.3.output.LayerNorm.bias', 'bert.encoder.layer.11.attention.self.query.bias', 'bert.encoder.layer.6.output.dense.weight', 'bert.encoder.layer.7.attention.self.query.bias', 'bert.encoder.layer.6.attention.self.value.weight', 'bert.encoder.layer.0.attention.self.query.bias', 'bert.encoder.layer.8.attention.self.key.bias', 'bert.encoder.layer.2.output.LayerNorm.bias', 'bert.encoder.layer.5.intermediate.dense.weight', 'bert.encoder.layer.2.intermediate.dense.bias', 'bert.encoder.layer.6.attention.self.query.bias', 'bert.encoder.layer.5.output.LayerNorm.weight', 'bert.encoder.layer.4.attention.self.value.bias', 'bert.encoder.layer.1.output.LayerNorm.weight', 'bert.encoder.layer.11.output.dense.bias', 'bert.encoder.layer.11.attention.self.value.weight', 'bert.encoder.layer.4.attention.self.query.bias', 'bert.encoder.layer.4.intermediate.dense.weight', 'bert.encoder.layer.10.attention.self.key.bias', 'bert.encoder.layer.7.attention.self.value.weight', 'bert.encoder.layer.10.attention.self.key.weight', 'bert.encoder.layer.1.output.dense.bias', 'bert.encoder.layer.8.intermediate.dense.bias', 'bert.encoder.layer.11.attention.self.query.weight', 'bert.encoder.layer.3.intermediate.dense.bias', 'bert.pooler.dense.weight', 'bert.encoder.layer.7.attention.self.key.bias', 'bert.encoder.layer.2.output.dense.bias', 'bert.encoder.layer.5.attention.self.key.bias', 'bert.encoder.layer.10.intermediate.dense.weight', 'bert.encoder.layer.8.intermediate.dense.weight', 'bert.encoder.layer.2.output.LayerNorm.weight', 'bert.encoder.layer.0.intermediate.dense.bias', 'bert.encoder.layer.3.output.dense.weight', 'bert.encoder.layer.9.attention.self.query.weight', 'bert.encoder.layer.1.attention.self.value.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
What should I do to have the pre-trained version of DNABERT-2-117M
?
I want to fine-tune the model for another task.
When calling :
hidden_states = model(inputs)[0] # [1, sequence_length, 768]
we receive traceback:
in <cell line: 1>:1 │
│ │
│ /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1501 in _call_impl │
│ │
│ 1498 │ │ if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks │
│ 1499 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hooks │
│ 1500 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │
│ ❱ 1501 │ │ │ return forward_call(*args, **kwargs) │
│ 1502 │ │ # Do not call functions when jit is used │
│ 1503 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1504 │ │ backward_pre_hooks = [] │
│ │
│ /root/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/5fd206e1a13cee3e │
│ f4a608677312175eb6f8143d/bert_layers.py:608 in forward │
│ │
│ 605 │ │ │ first_col_mask[:, 0] = True │
│ 606 │ │ │ subset_mask = masked_tokens_mask | first_col_mask │
│ 607 │ │ │
│ ❱ 608 │ │ encoder_outputs = self.encoder( │
│ 609 │ │ │ embedding_output, │
│ 610 │ │ │ attention_mask, │
│ 611 │ │ │ output_all_encoded_layers=output_all_encoded_layers, │
│ │
│ /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1501 in _call_impl │
│ │
│ 1498 │ │ if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks │
│ 1499 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hooks │
│ 1500 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │
│ ❱ 1501 │ │ │ return forward_call(*args, **kwargs) │
│ 1502 │ │ # Do not call functions when jit is used │
│ 1503 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1504 │ │ backward_pre_hooks = [] │
│ │
│ /root/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/5fd206e1a13cee3e │
│ f4a608677312175eb6f8143d/bert_layers.py:446 in forward │
│ │
│ 443 │ │ all_encoder_layers = [] │
│ 444 │ │ if subset_mask is None: │
│ 445 │ │ │ for layer_module in self.layer: │
│ ❱ 446 │ │ │ │ hidden_states = layer_module(hidden_states, │
│ 447 │ │ │ │ │ │ │ │ │ │ │ cu_seqlens, │
│ 448 │ │ │ │ │ │ │ │ │ │ │ seqlen, │
│ 449 │ │ │ │ │ │ │ │ │ │ │ None, │
│ │
│ /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1501 in _call_impl │
│ │
│ 1498 │ │ if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks │
│ 1499 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hooks │
│ 1500 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │
│ ❱ 1501 │ │ │ return forward_call(*args, **kwargs) │
│ 1502 │ │ # Do not call functions when jit is used │
│ 1503 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1504 │ │ backward_pre_hooks = [] │
│ │
│ /root/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/5fd206e1a13cee3e │
│ f4a608677312175eb6f8143d/bert_layers.py:327 in forward │
│ │
│ 324 │ │ │ attn_mask: None or (batch, max_seqlen_in_batch) │
│ 325 │ │ │ bias: None or (batch, heads, max_seqlen_in_batch, max_seqlen_in_batch) │
│ 326 │ │ """ │
│ ❱ 327 │ │ attention_output = self.attention(hidden_states, cu_seqlens, seqlen, │
│ 328 │ │ │ │ │ │ │ │ │ │ subset_idx, indices, attn_mask, bias) │
│ 329 │ │ layer_output = self.mlp(attention_output) │
│ 330 │ │ return layer_output │
│ │
│ /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1501 in _call_impl │
│ │
│ 1498 │ │ if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks │
│ 1499 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hooks │
│ 1500 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │
│ ❱ 1501 │ │ │ return forward_call(*args, **kwargs) │
│ 1502 │ │ # Do not call functions when jit is used │
│ 1503 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1504 │ │ backward_pre_hooks = [] │
│ │
│ /root/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/5fd206e1a13cee3e │
│ f4a608677312175eb6f8143d/bert_layers.py:240 in forward │
│ │
│ 237 │ │ │ attn_mask: None or (batch, max_seqlen_in_batch) │
│ 238 │ │ │ bias: None or (batch, heads, max_seqlen_in_batch, max_seqlen_in_batch) │
│ 239 │ │ """ │
│ ❱ 240 │ │ self_output = self.self(input_tensor, cu_seqlens, max_s, indices, │
│ 241 │ │ │ │ │ │ │ │ attn_mask, bias) │
│ 242 │ │ if subset_idx is not None: │
│ 243 │ │ │ return self.output(index_first_axis(self_output, subset_idx), │
│ │
│ /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1501 in _call_impl │
│ │
│ 1498 │ │ if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks │
│ 1499 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hooks │
│ 1500 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │
│ ❱ 1501 │ │ │ return forward_call(*args, **kwargs) │
│ 1502 │ │ # Do not call functions when jit is used │
│ 1503 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1504 │ │ backward_pre_hooks = [] │
│ │
│ /root/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/5fd206e1a13cee3e │
│ f4a608677312175eb6f8143d/bert_layers.py:181 in forward │
│ │
│ 178 │ │ │ │ qkv = qkv.to(torch.float16) │
│ 179 │ │ │ │ bias_dtype = bias.dtype │
│ 180 │ │ │ │ bias = bias.to(torch.float16) │
│ ❱ 181 │ │ │ │ attention = flash_attn_qkvpacked_func(qkv, bias) │
│ 182 │ │ │ │ attention = attention.to(orig_dtype) │
│ 183 │ │ │ │ bias = bias.to(bias_dtype) │
│ 184 │ │ │ else: │
│ │
│ /usr/local/lib/python3.10/dist-packages/torch/autograd/function.py:506 in apply │
│ │
│ 503 │ │ if not torch._C._are_functorch_transforms_active(): │
│ 504 │ │ │ # See NOTE: [functorch vjp and autograd interaction] │
│ 505 │ │ │ args = _functorch.utils.unwrap_dead_wrappers(args) │
│ ❱ 506 │ │ │ return super().apply(*args, **kwargs) # type: ignore[misc] │
│ 507 │ │ │
│ 508 │ │ if cls.setup_context == _SingleLevelFunction.setup_context: │
│ 509 │ │ │ raise RuntimeError( │
│ │
│ /root/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/5fd206e1a13cee3e │
│ f4a608677312175eb6f8143d/flash_attn_triton.py:1021 in forward │
│ │
│ 1018 │ │ # Make sure that the last dimension is contiguous │
│ 1019 │ │ if qkv.stride(-1) != 1: │
│ 1020 │ │ │ qkv = qkv.contiguous() │
│ ❱ 1021 │ │ o, lse, ctx.softmax_scale = _flash_attn_forward( │
│ 1022 │ │ │ qkv[:, :, 0], │
│ 1023 │ │ │ qkv[:, :, 1], │
│ 1024 │ │ │ qkv[:, :, 2], │
│ │
│ /root/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/5fd206e1a13cee3e │
│ f4a608677312175eb6f8143d/flash_attn_triton.py:781 in _flash_attn_forward │
│ │
│ 778 │ assert q.dtype == k.dtype == v.dtype, 'All tensors must have the same type' │
│ 779 │ assert q.dtype in [torch.float16, │
│ 780 │ │ │ │ │ torch.bfloat16], 'Only support fp16 and bf16' │
│ ❱ 781 │ assert q.is_cuda and k.is_cuda and v.is_cuda │
│ 782 │ softmax_scale = softmax_scale or 1.0 / math.sqrt(d) │
│ 783 │ │
│ 784 │ has_bias = bias is not None │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
AssertionError
Hi,
Thank you so much for updating DNABERT!!!
I'm running into the following error both on the command line and when I try to run the scripts/run_dnabert2.sh script. Do you have any insight into what's going wrong here?
Here's the command line version:
Python 3.8.17 (default, Jul 5 2023, 21:04:15)
[GCC 11.2.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> from transformers import AutoTokenizer, AutoModel
>>> tokenizer = AutoTokenizer.from_pretrained("zhihan1996/DNABERT-2-117M", trust_remote_code=True)
>>> model = AutoModel.from_pretrained("zhihan1996/DNABERT-2-117M", trust_remote_code=True)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/nas/longleaf/home/mkratz/.conda/envs/dna/lib/python3.8/site-packages/transformers/models/auto/auto_factory.py", line 487, in from_pretrained
cls.register(config.__class__, model_class, exist_ok=True)
File "/nas/longleaf/home/mkratz/.conda/envs/dna/lib/python3.8/site-packages/transformers/models/auto/auto_factory.py", line 513, in register
raise ValueError(
ValueError: The model class you are passing has a `config_class` attribute that is not consistent with the config class you passed (model has <class 'transformers.models.bert.configuration_bert.BertConfig'> and you passed <class 'transformers_modules.zhihan1996.DNABERT-2-117M.5fd206e1a13cee3ef4a608677312175eb6f8143d.configuration_bert.BertConfig'>. Fix one of those so they match!
Thanks!!!
Hello! Congratulations on your work. I was so happy to see this come out!
I have a question about running the GUE tests according to your instructions. In DNABERT1 you had to physically download the pre-trained model. It seems from looking at the code that now the models are hosted at huggingface and I do not need to download the pre-trained models to run these tests. Is that correct?
My goal is to do some further pre-training with bacterial datasets starting from a pre-trained model. To do this, would I load your pre-trained model as a checkpoint and then run a script similar to the pretraining script from DNABERT1?
Thanks for any help you can provide and again, great work!
LeAnn
import torch
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("zhihan1996/DNABERT-2-117M", trust_remote_code=True)
model = AutoModel.from_pretrained("zhihan1996/DNABERT-2-117M", trust_remote_code=True)
dna = "ACGTAGCATCGGATCTATCTATCGACACTTGGTTATCGATCTACGAGCATCTCGTTAGC"
inputs = tokenizer(dna, return_tensors = 'pt')["input_ids"]
hidden_states = model(inputs)[0] # [1, sequence_length, 768]
embedding_mean = torch.mean(hidden_states[0], dim=0)
print(embedding_mean.shape) # expect to be 768
embedding_max = torch.max(hidden_states[0], dim=0)[0]
print(embedding_max.shape) # expect to be 768
/anaconda3/envs/dna/lib/python3.8/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
from .autonotebook import tqdm as notebook_tqdm
Some weights of the model checkpoint at zhihan1996/DNABERT-2-117M were not used when initializing BertModel: ['cls.predictions.transform.dense.bias', 'cls.predictions.decoder.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.decoder.weight']
AssertionError Traceback (most recent call last)
Cell In[1], line 9
7 dna = "ACGTAGCATCGGATCTATCTATCGACACTTGGTTATCGATCTACGAGCATCTCGTTAGC"
8 inputs = tokenizer(dna, return_tensors = 'pt')["input_ids"]
----> 9 hidden_states = model(inputs)[0] # [1, sequence_length, 768]
11 # embedding with mean pooling
12 embedding_mean = torch.mean(hidden_states[0], dim=0)
File ~/anaconda3/envs/dna/lib/python3.8/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
1496 # If we don't have any hooks, we want to skip the rest of the logic in
1497 # this function, and just call forward.
1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = [], []
File ~/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/5fd206e1a13cee3ef4a608677312175eb6f8143d/bert_layers.py:608, in BertModel.forward(self, input_ids, token_type_ids, attention_mask, position_ids, output_all_encoded_layers, masked_tokens_mask, **kwargs)
605 first_col_mask[:, 0] = True
606 subset_mask = masked_tokens_mask | first_col_mask
--> 608 encoder_outputs = self.encoder(
609 embedding_output,
610 attention_mask,
611 output_all_encoded_layers=output_all_encoded_layers,
612 subset_mask=subset_mask)
614 if masked_tokens_mask is None:
615 sequence_output = encoder_outputs[-1]
File ~/anaconda3/envs/dna/lib/python3.8/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
1496 # If we don't have any hooks, we want to skip the rest of the logic in
1497 # this function, and just call forward.
1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = [], []
File ~/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/5fd206e1a13cee3ef4a608677312175eb6f8143d/bert_layers.py:446, in BertEncoder.forward(self, hidden_states, attention_mask, output_all_encoded_layers, subset_mask)
444 if subset_mask is None:
445 for layer_module in self.layer:
--> 446 hidden_states = layer_module(hidden_states,
447 cu_seqlens,
448 seqlen,
449 None,
450 indices,
451 attn_mask=attention_mask,
452 bias=alibi_attn_mask)
453 if output_all_encoded_layers:
454 all_encoder_layers.append(hidden_states)
File ~/anaconda3/envs/dna/lib/python3.8/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
1496 # If we don't have any hooks, we want to skip the rest of the logic in
1497 # this function, and just call forward.
1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = [], []
File ~/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/5fd206e1a13cee3ef4a608677312175eb6f8143d/bert_layers.py:327, in BertLayer.forward(self, hidden_states, cu_seqlens, seqlen, subset_idx, indices, attn_mask, bias)
305 def forward(
306 self,
307 hidden_states: torch.Tensor,
(...)
313 bias: Optional[torch.Tensor] = None,
314 ) -> torch.Tensor:
315 """Forward pass for a BERT layer, including both attention and MLP.
316
317 Args:
(...)
325 bias: None or (batch, heads, max_seqlen_in_batch, max_seqlen_in_batch)
326 """
--> 327 attention_output = self.attention(hidden_states, cu_seqlens, seqlen,
328 subset_idx, indices, attn_mask, bias)
329 layer_output = self.mlp(attention_output)
330 return layer_output
File ~/anaconda3/envs/dna/lib/python3.8/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
1496 # If we don't have any hooks, we want to skip the rest of the logic in
1497 # this function, and just call forward.
1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = [], []
File ~/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/5fd206e1a13cee3ef4a608677312175eb6f8143d/bert_layers.py:240, in BertUnpadAttention.forward(self, input_tensor, cu_seqlens, max_s, subset_idx, indices, attn_mask, bias)
218 def forward(
219 self,
220 input_tensor: torch.Tensor,
(...)
226 bias: Optional[torch.Tensor] = None,
227 ) -> torch.Tensor:
228 """Forward pass for scaled self-attention without padding.
229
230 Arguments:
(...)
238 bias: None or (batch, heads, max_seqlen_in_batch, max_seqlen_in_batch)
239 """
--> 240 self_output = self.self(input_tensor, cu_seqlens, max_s, indices,
241 attn_mask, bias)
242 if subset_idx is not None:
243 return self.output(index_first_axis(self_output, subset_idx),
244 index_first_axis(input_tensor, subset_idx))
File ~/anaconda3/envs/dna/lib/python3.8/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
1496 # If we don't have any hooks, we want to skip the rest of the logic in
1497 # this function, and just call forward.
1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = [], []
File ~/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/5fd206e1a13cee3ef4a608677312175eb6f8143d/bert_layers.py:181, in BertUnpadSelfAttention.forward(self, hidden_states, cu_seqlens, max_seqlen_in_batch, indices, attn_mask, bias)
179 bias_dtype = bias.dtype
180 bias = bias.to(torch.float16)
--> 181 attention = flash_attn_qkvpacked_func(qkv, bias)
182 attention = attention.to(orig_dtype)
183 bias = bias.to(bias_dtype)
File ~/anaconda3/envs/dna/lib/python3.8/site-packages/torch/autograd/function.py:506, in Function.apply(cls, *args, **kwargs)
503 if not torch._C._are_functorch_transforms_active():
504 # See NOTE: [functorch vjp and autograd interaction]
505 args = _functorch.utils.unwrap_dead_wrappers(args)
--> 506 return super().apply(*args, **kwargs) # type: ignore[misc]
508 if cls.setup_context == _SingleLevelFunction.setup_context:
509 raise RuntimeError(
510 'In order to use an autograd.Function with functorch transforms '
511 '(vmap, grad, jvp, jacrev, ...), it must override the setup_context '
512 'staticmethod. For more details, please see '
513 'https://pytorch.org/docs/master/notes/extending.func.html')
File ~/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/5fd206e1a13cee3ef4a608677312175eb6f8143d/flash_attn_triton.py:1021, in _FlashAttnQKVPackedFunc.forward(ctx, qkv, bias, causal, softmax_scale)
1019 if qkv.stride(-1) != 1:
1020 qkv = qkv.contiguous()
-> 1021 o, lse, ctx.softmax_scale = _flash_attn_forward(
1022 qkv[:, :, 0],
1023 qkv[:, :, 1],
1024 qkv[:, :, 2],
1025 bias=bias,
1026 causal=causal,
1027 softmax_scale=softmax_scale)
1028 ctx.save_for_backward(qkv, o, lse, bias)
1029 ctx.causal = causal
File ~/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/5fd206e1a13cee3ef4a608677312175eb6f8143d/flash_attn_triton.py:781, in _flash_attn_forward(q, k, v, bias, causal, softmax_scale)
778 assert q.dtype == k.dtype == v.dtype, 'All tensors must have the same type'
779 assert q.dtype in [torch.float16,
780 torch.bfloat16], 'Only support fp16 and bf16'
--> 781 assert q.is_cuda and k.is_cuda and v.is_cuda
782 softmax_scale = softmax_scale or 1.0 / math.sqrt(d)
784 has_bias = bias is not None
AssertionError:
model = AutoModel.from_pretrained("zhihan1996/DNABERT-2-117M", trust_remote_code=True)
triggers an error:
The model class you are passing has a config_class
attribute that is not consistent with the config class you passed (model has <class 'transformers.models.bert.configuration_bert.BertConfig'> and you passed <class 'transformers_modules.zhihan1996.DNABERT-2-117M.81ac6a98387cf94bc283553260f3fa6b88cef2fa.configuration_bert.BertConfig'>. Fix one of those so they match!
Any idea how to by pass this?
Another question would be after obtain the token embeddings, any way to convert it back to embeddings for each nucleotide? Thanks!
Dear Zhihan,
Thank you so much for the great contribution to the filed to develop this awesome pretraining model. The manuscript doesn't explicitly mention whether the attention maps (DNABERT-viz) are generated during the pretraining stage or the fine-tuning stage. Could you please more some explicitly explanation?
Thanks.
Shicheng
Hi,
I just wanted to note that installing the requirements.txt in a clean environment leads to versions which are not compatible, leading to errors with triton. Installing "some" of the versions listed in this discussion helped for me.
Thanks again for your work!
What is the problem and the solution??
The provided data_path is /home/shiro/DNABERT_2/finetune
2023-08-31 17:57:18.856636: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib:/usr/local/cuda/lib64:
2023-08-31 17:57:18.856685: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
WARNING:root:Perform single sequence classification...
WARNING:root:Perform single sequence classification...
WARNING:root:Perform single sequence classification...
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers
before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Some weights of the model checkpoint at zhihan1996/DNABERT-2-117M were not used when initializing BertForSequenceClassification: ['cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.decoder.bias']
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "train_modified.py", line 332, in
train()
File "train_modified.py", line 314, in train
trainer.train()
File "/home/shiro/miniconda3/envs/dnabert2/lib/python3.8/site-packages/transformers/trainer.py", line 1664, in train
return inner_training_loop(
File "/home/shiro/miniconda3/envs/dnabert2/lib/python3.8/site-packages/transformers/trainer.py", line 1940, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/home/shiro/miniconda3/envs/dnabert2/lib/python3.8/site-packages/transformers/trainer.py", line 2745, in training_step
self.scaler.scale(loss).backward()
File "/home/shiro/miniconda3/envs/dnabert2/lib/python3.8/site-packages/torch/_tensor.py", line 487, in backward
torch.autograd.backward(
File "/home/shiro/miniconda3/envs/dnabert2/lib/python3.8/site-packages/torch/autograd/init.py", line 197, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
File "/home/shiro/miniconda3/envs/dnabert2/lib/python3.8/site-packages/torch/autograd/function.py", line 267, in apply
return user_fn(self, *args)
File "/home/shiro/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/81ac6a98387cf94bc283553260f3fa6b88cef2fa/flash_attn_triton.py", line 1041, in backward
_flash_attn_backward(do,
File "/home/shiro/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/81ac6a98387cf94bc283553260f3fa6b88cef2fa/flash_attn_triton.py", line 949, in _flash_attn_backward
_bwd_kernel[grid]( # type: ignore
File "/home/shiro/miniconda3/envs/dnabert2/lib/python3.8/site-packages/triton/runtime/jit.py", line 106, in launcher
return self.run(*args, grid=grid, **kwargs)
File "/home/shiro/miniconda3/envs/dnabert2/lib/python3.8/site-packages/triton/runtime/autotuner.py", line 73, in run
timings = {config: self._bench(*args, config=config, **kwargs)
File "/home/shiro/miniconda3/envs/dnabert2/lib/python3.8/site-packages/triton/runtime/autotuner.py", line 73, in
timings = {config: self._bench(*args, config=config, **kwargs)
File "/home/shiro/miniconda3/envs/dnabert2/lib/python3.8/site-packages/triton/runtime/autotuner.py", line 63, in _bench
return do_bench(kernel_call)
File "/home/shiro/miniconda3/envs/dnabert2/lib/python3.8/site-packages/triton/testing.py", line 140, in do_bench
fn()
File "/home/shiro/miniconda3/envs/dnabert2/lib/python3.8/site-packages/triton/runtime/autotuner.py", line 62, in kernel_call
self.fn.run(*args, num_warps=config.num_warps, num_stages=config.num_stages, **current)
File "/home/shiro/miniconda3/envs/dnabert2/lib/python3.8/site-packages/triton/runtime/autotuner.py", line 200, in run
return self.fn.run(*args, **kwargs)
File "", line 43, in _bwd_kernel
RuntimeError: Triton Error [CUDA]: invalid argument
0%| | 0/5700 [00:00<?, ?it/s
The perfermance in Table4 is F1 score ? I always got higher score when run run_dnabert2.sh...
Hey,
Congrats on your contribution but the code isn't properly written. It might be that at the time of training for you, it must've worked but now, there are some severe issues. Please don't get me wrong, I hope you fix them or give me an opportunity to contribute to fix it.
The key issues with DNABERT-2
Are you planning to fix these issues? Thanks for the huggingface model btw. That seems to be working a ok!
Hi, do you have any examples of how to extract sequence embeddings using this model?
I tried the following code but get an error:
import torch
from transformers import AutoTokenizer, AutoModelForMaskedLM
tokenizer = AutoTokenizer.from_pretrained("zhihan1996/DNABERT-2-117M")
model = AutoModelForMaskedLM.from_pretrained("zhihan1996/DNABERT-2-117M")
tok = tokenizer("ACGTAGCATCGGATCTATCTATCGACACTTGGTTATCGATCTACGAGCATCTCGTTAGC", return_tensors = 'pt')
outs = model(tok)
Gives error below:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
File ~/mambaforge/envs/predictor/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:254, in BatchEncoding.__getattr__(self, item)
253 try:
--> 254 return self.data[item]
255 except KeyError:
KeyError: 'size'
During handling of the above exception, another exception occurred:
AttributeError Traceback (most recent call last)
Cell In[67], line 1
----> 1 outs = model(tok)
File ~/mambaforge/envs/predictor/lib/python3.10/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
1496 # If we don't have any hooks, we want to skip the rest of the logic in
1497 # this function, and just call forward.
1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = [], []
File ~/mambaforge/envs/predictor/lib/python3.10/site-packages/transformers/models/bert/modeling_bert.py:1358, in BertForMaskedLM.forward(self, input_ids, attention_mask, token_type_ids, position_ids, head_mask, inputs_embeds, encoder_hidden_states, encoder_attention_mask, labels, output_attentions, output_hidden_states, return_dict)
1349 r"""
1350 labels (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*):
1351 Labels for computing the masked language modeling loss. Indices should be in `[-100, 0, ...,
1352 config.vocab_size]` (see `input_ids` docstring) Tokens with indices set to `-100` are ignored (masked), the
1353 loss is only computed for the tokens with labels in `[0, ..., config.vocab_size]`
1354 """
1356 return_dict = return_dict if return_dict is not None else self.config.use_return_dict
-> 1358 outputs = self.bert(
1359 input_ids,
1360 attention_mask=attention_mask,
1361 token_type_ids=token_type_ids,
1362 position_ids=position_ids,
1363 head_mask=head_mask,
1364 inputs_embeds=inputs_embeds,
1365 encoder_hidden_states=encoder_hidden_states,
1366 encoder_attention_mask=encoder_attention_mask,
1367 output_attentions=output_attentions,
1368 output_hidden_states=output_hidden_states,
1369 return_dict=return_dict,
1370 )
1372 sequence_output = outputs[0]
1373 prediction_scores = self.cls(sequence_output)
File ~/mambaforge/envs/predictor/lib/python3.10/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
1496 # If we don't have any hooks, we want to skip the rest of the logic in
1497 # this function, and just call forward.
1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = [], []
File ~/mambaforge/envs/predictor/lib/python3.10/site-packages/transformers/models/bert/modeling_bert.py:968, in BertModel.forward(self, input_ids, attention_mask, token_type_ids, position_ids, head_mask, inputs_embeds, encoder_hidden_states, encoder_attention_mask, past_key_values, use_cache, output_attentions, output_hidden_states, return_dict)
966 raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time")
967 elif input_ids is not None:
--> 968 input_shape = input_ids.size()
969 elif inputs_embeds is not None:
970 input_shape = inputs_embeds.size()[:-1]
File ~/mambaforge/envs/predictor/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:256, in BatchEncoding.__getattr__(self, item)
254 return self.data[item]
255 except KeyError:
--> 256 raise AttributeError
AttributeError:
This was not reported in the paper but I was interested in what the perplexity value was after pre-training. Thank you,
LeAnn
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.