Giter Club home page Giter Club logo

dnabert_2's People

Contributors

li-hongmin avatar sudhendra avatar zhihan1996 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dnabert_2's Issues

lora_target_modules default wrong

Hi,

Thanks a lot for uploading DNABERT2.
I have noticed a small mistake when using lora, the default lora_target_modules ["query", "value"] do not seem to be correct. Through trial and error ["q", "v"] worked for me.

Embedding long sequences

Hi,

Thanks for making DNABERT2 available!

I want to prepare embeddings of potentially long sequences for downstream use. How would you recommend I do that?

A) Taking the sequence as-is and embedding all at once ( I guess it is technically possible with ALiBi ?)
B) Chunking the sequence into smaller pieces as done for pretraining, and then concatenating the embeddings (128 nucleotides or 128 BPE tokens? Not sure)

Would appreciate your help!

Functional variants identification with DNABERT-2

Dear Zhihan,

Thank you for the great contribution to the Genome Foundation Model filed. I am quite interested in the Functional variants identification with DNABERT-2 section. However, I am not so sure about the "predicted high-attention regions" is from pre-training model or fine-tuning model. if it is fine-tuning model, which sub-task it is from?

We applied DNABERT to identify functional variants using around 700 million short variants in dbSNP([Sherry, 2001]. Specifically, we selected only those variants that are located inside DNABERT predicted high-attention regions and repeated the predictions, using sequences with altered alleles.

Thanks.

Shicheng

I think I get the point now. It is from fine-tuning stage.

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 2 dimensions.

I am trying to run the finetuning example that you have on the github and I keep getting this error. I am able to successfully run the quickstart section of your github.

The provided data_path is /uufs/chpc.utah.edu/common/home/u1323098/sundar-group-space2/PHAGE/MODELS/GUE/prom/prom_300_tata
WARNING:root:Perform single sequence classification...
WARNING:root:Perform single sequence classification...
WARNING:root:Perform single sequence classification...
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
/uufs/chpc.utah.edu/common/home/u1323098/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/5fd206e1a13cee3ef4a608677312175eb6f8143d/bert_layers.py:125: UserWarning: Unable to import Triton; defaulting MosaicBERT attention implementation to pytorch (this will reduce throughput when using this model).
  warnings.warn(
Some weights of the model checkpoint at zhihan1996/DNABERT-2-117M were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.LayerNorm.weight', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.bias', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at zhihan1996/DNABERT-2-117M and are newly initialized: ['classifier.bias', 'classifier.weight', 'bert.pooler.dense.bias', 'bert.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Using cuda_amp half precision backend
***** Running training *****
  Num examples = 4,904
  Num Epochs = 3
  Instantaneous batch size per device = 32
  Total train batch size (w. parallel, distributed & accumulation) = 32
  Gradient Accumulation steps = 1
  Total optimization steps = 462
  Number of trainable parameters = 117,070,082
 43%|████████████████████████████████████████████████████████████▎                                                                               | 199/462 [00:12<00:12, 21.47it/s]***** Running Evaluation *****
  Num examples = 613
  Batch size = 32
                                                                                                                                                                                  Traceback (most recent call last):███████████████████████████████████████████████████████████████████████████████████████████████████▊              | 18/20 [00:00<00:00, 80.37it/s]
  File "train.py", line 286, in <module>
    train()
  File "train.py", line 268, in train
    trainer.train()
  File "/uufs/chpc.utah.edu/common/home/u1323098/software/pkg/miniconda3/envs/dna/lib/python3.8/site-packages/transformers/trainer.py", line 1662, in train
    return inner_training_loop(
  File "/uufs/chpc.utah.edu/common/home/u1323098/software/pkg/miniconda3/envs/dna/lib/python3.8/site-packages/transformers/trainer.py", line 2006, in _inner_training_loop
    self._maybe_log_save_evaluate(tr_loss, model, trial, epoch, ignore_keys_for_eval)
  File "/uufs/chpc.utah.edu/common/home/u1323098/software/pkg/miniconda3/envs/dna/lib/python3.8/site-packages/transformers/trainer.py", line 2287, in _maybe_log_save_evaluate
    metrics = self.evaluate(ignore_keys=ignore_keys_for_eval)
  File "/uufs/chpc.utah.edu/common/home/u1323098/software/pkg/miniconda3/envs/dna/lib/python3.8/site-packages/transformers/trainer.py", line 2993, in evaluate
    output = eval_loop(
  File "/uufs/chpc.utah.edu/common/home/u1323098/software/pkg/miniconda3/envs/dna/lib/python3.8/site-packages/transformers/trainer.py", line 3281, in evaluation_loop
    metrics = self.compute_metrics(EvalPrediction(predictions=all_preds, label_ids=all_labels))
  File "train.py", line 204, in compute_metrics
    return calculate_metric_with_sklearn(logits, labels)
  File "train.py", line 190, in calculate_metric_with_sklearn
    predictions = np.argmax(logits, axis=-1)
  File "<__array_function__ internals>", line 200, in argmax
  File "/uufs/chpc.utah.edu/common/home/u1323098/software/pkg/miniconda3/envs/dna/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 1242, in argmax
    return _wrapfunc(a, 'argmax', axis=axis, out=out, **kwds)
  File "/uufs/chpc.utah.edu/common/home/u1323098/software/pkg/miniconda3/envs/dna/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 54, in _wrapfunc
    return _wrapit(obj, method, *args, **kwds)
  File "/uufs/chpc.utah.edu/common/home/u1323098/software/pkg/miniconda3/envs/dna/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 43, in _wrapit
    result = getattr(asarray(obj), method)(*args, **kwds)
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 2 dimensions. The detected shape was (2, 613) + inhomogeneous part.
 43%|████████████████████████████████████████████████████████████▌                                                                               | 200/462 [00:12<00:16, 15.91it/s]

Here are the installation instructions that are working on my computer system (university of utah CHPC)

salloc --account=soc-gpu-np --partition=soc-gpu-np --nodes=1 --gres=gpu:a100:1

conda activate dna
cd sundar-group-space2/PHAGE/MODELS/DNABERT_2/
python3 -m pip install -r requirements.txt
pip uninstall triton

previously I had changed the requirements.txt to read

einops  
transformers==4.28.0  
peft  
omegaconf  
torch  
evaluate  
accelerate

I also had to change the learning rate from ${lr} to 1e-4 in run_dnabert2.sh

Pre-training code

Hi,

Any estimate when the pre-training code will be available?

Thanks

Fine-tuning process

Hi @Zhihan1996 , thanks for providing the code for finetuning DNABERT2. But there is no mention of how to generate the dev.tsv, test.csv and train.csv from our own dataset and how to provide the label 1 and 0 to the sequences. can you please let me know how to do that?

GUE datasets details

Hi! Very interesting work!
Could you please provide some details about the datasets in GUE?
For example, Splicing dataset has labels 0,1,2 which should correspond to donor, acceptor and non-splicing site, but which is which?
There are 5 subfolders in tf and mouse folders, but what are the transcription factors corresponding to these 0 - 4 folders?

Inference fails with output_all_encoded_layers=True

I am trying to extract hidden layer output from all the layers in the model. As per the documentation, the output_all_encoded_layers: boolean which controls the content of the encoded_layers output as described below. Default: True.. However Line 586 (https://huggingface.co/zhihan1996/DNABERT-2-117M/blob/main/bert_layers.py#L586) has this set to False, which I was expecting to be the case in contrast to what documentation says because only last layer was returned in the output. However, when I set it to True the inference fails. The traceback is as follows:


RuntimeError Traceback (most recent call last)
Cell In[60], line 1
----> 1 output = model(**b, output_all_encoded_layers=True)

File /lustre/scratch124/casm/team113/users/pg20/venvs/huggingface/lib/python3.10/site-packages/torch/nn/modules/module.py:1518, in Module._wrapped_call_impl(self, *args, **kwargs)
1516 return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc]
1517 else:
-> 1518 return self._call_impl(*args, **kwargs)

File /lustre/scratch124/casm/team113/users/pg20/venvs/huggingface/lib/python3.10/site-packages/torch/nn/modules/module.py:1527, in Module._call_impl(self, *args, **kwargs)
1522 # If we don't have any hooks, we want to skip the rest of the logic in
1523 # this function, and just call forward.
1524 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1525 or _global_backward_pre_hooks or _global_backward_hooks
1526 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1527 return forward_call(*args, **kwargs)
1529 try:
1530 result = None

File /lustre/scratch124/casm/team113/users/pg20/data/supporting/huggingface_models/modules/transformers_modules/zhihan1996/DNABERT-2-117M/81ac6a98387cf94bc283553260f3fa6b88cef2fa/bert_layers.py:616, in BertModel.forward(self, input_ids, token_type_ids, attention_mask, position_ids, output_all_encoded_layers, masked_tokens_mask, **kwargs)
614 if masked_tokens_mask is None:
615 sequence_output = encoder_outputs[-1]
--> 616 pooled_output = self.pooler(
617 sequence_output) if self.pooler is not None else None
618 else:
619 # TD [2022-03-01]: the indexing here is very tricky.
620 attention_mask_bool = attention_mask.bool()

File /lustre/scratch124/casm/team113/users/pg20/venvs/huggingface/lib/python3.10/site-packages/torch/nn/modules/module.py:1518, in Module._wrapped_call_impl(self, *args, **kwargs)
1516 return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc]
1517 else:
-> 1518 return self._call_impl(*args, **kwargs)

File /lustre/scratch124/casm/team113/users/pg20/venvs/huggingface/lib/python3.10/site-packages/torch/nn/modules/module.py:1527, in Module._call_impl(self, *args, **kwargs)
1522 # If we don't have any hooks, we want to skip the rest of the logic in
1523 # this function, and just call forward.
1524 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1525 or _global_backward_pre_hooks or _global_backward_hooks
1526 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1527 return forward_call(*args, **kwargs)
1529 try:
1530 result = None

File /lustre/scratch124/casm/team113/users/pg20/data/supporting/huggingface_models/modules/transformers_modules/zhihan1996/DNABERT-2-117M/81ac6a98387cf94bc283553260f3fa6b88cef2fa/bert_layers.py:501, in BertPooler.forward(self, hidden_states, pool)
495 def forward(self,
496 hidden_states: torch.Tensor,
497 pool: Optional[bool] = True) -> torch.Tensor:
498 # We "pool" the model by simply taking the hidden state corresponding
499 # to the first token.
500 first_token_tensor = hidden_states[:, 0] if pool else hidden_states
--> 501 pooled_output = self.dense(first_token_tensor)
502 pooled_output = self.activation(pooled_output)
503 return pooled_output

File /lustre/scratch124/casm/team113/users/pg20/venvs/huggingface/lib/python3.10/site-packages/torch/nn/modules/module.py:1518, in Module._wrapped_call_impl(self, *args, **kwargs)
1516 return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc]
1517 else:
-> 1518 return self._call_impl(*args, **kwargs)

File /lustre/scratch124/casm/team113/users/pg20/venvs/huggingface/lib/python3.10/site-packages/torch/nn/modules/module.py:1527, in Module._call_impl(self, *args, **kwargs)
1522 # If we don't have any hooks, we want to skip the rest of the logic in
1523 # this function, and just call forward.
1524 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1525 or _global_backward_pre_hooks or _global_backward_hooks
1526 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1527 return forward_call(*args, **kwargs)
1529 try:
1530 result = None

File /lustre/scratch124/casm/team113/users/pg20/venvs/huggingface/lib/python3.10/site-packages/torch/nn/modules/linear.py:114, in Linear.forward(self, input)
113 def forward(self, input: Tensor) -> Tensor:
--> 114 return F.linear(input, self.weight, self.bias)

RuntimeError: mat1 and mat2 shapes cannot be multiplied (1x5 and 768x768)

Steps to reproduce the error:
import torch
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("zhihan1996/DNABERT-2-117M", trust_remote_code=True)
model = AutoModel.from_pretrained("zhihan1996/DNABERT-2-117M", trust_remote_code=True)
b = tokenizer('ATCG', return_tensors='pt', return_attention_mask=True)
output = model(**b, output_all_encoded_layers=True)

P.S. I am not using triton since it was failing in another step.

Version and compatibility for Ubuntu

We tried to test DNABERT-2 on AWS EC2 p2.xlarge instance with Ubuntu and CUDA 11.5 and gcc version 9 (and we tried also version 11).
Every attempt failed.
We set the environment exploiting the requirements.txt posted on github but it still no worked.
The trouble has come with the command: hidden_states = model(inputs)[0] # [1, sequence_length, 768]

>>> hidden_states = model(inputs)[0] # [1, sequence_length, 768]
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Traceback (most recent call last):
  File "<string>", line 21, in _fwd_kernel
KeyError: ('2-.-0-.-0--d6252949da17ceb5f3a278a70250af13-3b85c7bef5f0a641282f3b73af50f599-14de7de5c4da5794c8ca14e7e41a122d-3498c340fd4b6ee7805fd54b882a04f5-e1f133f98d04093da2078dfc51c36b72-b26258bf01f839199e39d64851821f26-d7c06e3b46e708006c15224aac7a1378-f585402118c8a136948ce0a49cfe122c', (torch.float16, torch.float16, torch.float16, torch.float16, torch.float16, torch.float32, torch.float32, 'fp32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32'), ('matrix', False, 64, False, False, True, 128, 128), (True, True, True, True, True, True, True, (False,), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (False, False), (False, False), (False, False), (True, False), (True, False), (True, False), (False, False), (False, False), (False, False), (True, False), (True, False), (True, False), (True, False)))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/triton/compiler.py", line 937, in build_triton_ir
    generator.visit(fn.parse())
  File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/triton/compiler.py", line 855, in visit
    return super().visit(node)
  File "/root/miniconda3/envs/dna/lib/python3.8/ast.py", line 371, in visit
    return visitor(node)
  File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/triton/compiler.py", line 183, in visit_Module
    ast.NodeVisitor.generic_visit(self, node)
  File "/root/miniconda3/envs/dna/lib/python3.8/ast.py", line 379, in generic_visit
    self.visit(item)
  File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/triton/compiler.py", line 855, in visit
    return super().visit(node)
  File "/root/miniconda3/envs/dna/lib/python3.8/ast.py", line 371, in visit
    return visitor(node)
  File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/triton/compiler.py", line 252, in visit_FunctionDef
    has_ret = self.visit_compound_statement(node.body)
  File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/triton/compiler.py", line 177, in visit_compound_statement
    self.last_ret_type = self.visit(stmt)
  File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/triton/compiler.py", line 855, in visit
    return super().visit(node)
  File "/root/miniconda3/envs/dna/lib/python3.8/ast.py", line 371, in visit
    return visitor(node)
  File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/triton/compiler.py", line 678, in visit_For
    self.visit_compound_statement(node.body)
  File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/triton/compiler.py", line 177, in visit_compound_statement
    self.last_ret_type = self.visit(stmt)
  File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/triton/compiler.py", line 855, in visit
    return super().visit(node)
  File "/root/miniconda3/envs/dna/lib/python3.8/ast.py", line 371, in visit
    return visitor(node)
  File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/triton/compiler.py", line 319, in visit_AugAssign
    self.visit(assign)
  File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/triton/compiler.py", line 855, in visit
    return super().visit(node)
  File "/root/miniconda3/envs/dna/lib/python3.8/ast.py", line 371, in visit
    return visitor(node)
  File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/triton/compiler.py", line 301, in visit_Assign
    values = self.visit(node.value)
  File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/triton/compiler.py", line 855, in visit
    return super().visit(node)
  File "/root/miniconda3/envs/dna/lib/python3.8/ast.py", line 371, in visit
    return visitor(node)
  File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/triton/compiler.py", line 339, in visit_BinOp
    rhs = self.visit(node.right)
  File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/triton/compiler.py", line 855, in visit
    return super().visit(node)
  File "/root/miniconda3/envs/dna/lib/python3.8/ast.py", line 371, in visit
    return visitor(node)
  File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/triton/compiler.py", line 797, in visit_Call
    return fn(*args, _builder=self.builder, **kws)
  File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/triton/impl/base.py", line 22, in wrapper
    return fn(*args, **kwargs)
TypeError: dot() got an unexpected keyword argument 'trans_b'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/81ac6a98387cf94bc283553260f3fa6b88cef2fa/bert_layers.py", line 608, in forward
    encoder_outputs = self.encoder(
  File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/81ac6a98387cf94bc283553260f3fa6b88cef2fa/bert_layers.py", line 446, in forward
    hidden_states = layer_module(hidden_states,
  File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/81ac6a98387cf94bc283553260f3fa6b88cef2fa/bert_layers.py", line 327, in forward
    attention_output = self.attention(hidden_states, cu_seqlens, seqlen,
  File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/81ac6a98387cf94bc283553260f3fa6b88cef2fa/bert_layers.py", line 240, in forward
    self_output = self.self(input_tensor, cu_seqlens, max_s, indices,
  File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/81ac6a98387cf94bc283553260f3fa6b88cef2fa/bert_layers.py", line 181, in forward
    attention = flash_attn_qkvpacked_func(qkv, bias)
  File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/torch/autograd/function.py", line 506, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/root/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/81ac6a98387cf94bc283553260f3fa6b88cef2fa/flash_attn_triton.py", line1021, in forward
    o, lse, ctx.softmax_scale = _flash_attn_forward(
  File "/root/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/81ac6a98387cf94bc283553260f3fa6b88cef2fa/flash_attn_triton.py", line826, in _flash_attn_forward
    _fwd_kernel[grid](  # type: ignore
  File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/triton/runtime/autotuner.py", line 90, in run
    return self.fn.run(*args, num_warps=config.num_warps, num_stages=config.num_stages, **kwargs, **config.kwargs)
  File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/triton/runtime/autotuner.py", line 199, in run
    return self.fn.run(*args, **kwargs)
  File "<string>", line 41, in _fwd_kernel
  File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/triton/compiler.py", line 1621, in compile
    next_module = compile(module)
  File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/triton/compiler.py", line 1550, in <lambda>
    lambda src: ast_to_ttir(src, signature, configs[0], constants)),
  File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/triton/compiler.py", line 962, in ast_to_ttir
    mod, _ = build_triton_ir(fn, signature, specialization, constants)
  File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/triton/compiler.py", line 942, in build_triton_ir
    raise CompilationError(fn.src, node) from e
triton.compiler.CompilationError: at 114:24:
def _fwd_kernel(
    Q,
    K,
    V,
    Bias,
    Out,
    Lse,
    TMP,  # NOTE: TMP is a scratchpad buffer to workaround a compiler bug
    softmax_scale,
    stride_qb,
    stride_qh,
    stride_qm,
    stride_kb,
    stride_kh,
    stride_kn,
    stride_vb,
    stride_vh,
    stride_vn,
    stride_bb,
    stride_bh,
    stride_bm,
    stride_ob,
    stride_oh,
    stride_om,
    nheads,
    seqlen_q,
    seqlen_k,
    seqlen_q_rounded,
    headdim,
    CACHE_KEY_SEQLEN_Q,
    CACHE_KEY_SEQLEN_K,
    BIAS_TYPE: tl.constexpr,
    IS_CAUSAL: tl.constexpr,
    BLOCK_HEADDIM: tl.constexpr,
    EVEN_M: tl.constexpr,
    EVEN_N: tl.constexpr,
    EVEN_HEADDIM: tl.constexpr,
    BLOCK_M: tl.constexpr,
    BLOCK_N: tl.constexpr,
):
    start_m = tl.program_id(0)
    off_hb = tl.program_id(1)
    off_b = off_hb // nheads
    off_h = off_hb % nheads
    # off_b = tl.program_id(1)
    # off_h = tl.program_id(2)
    # off_hb = off_b * nheads + off_h
    # initialize offsets
    offs_m = start_m * BLOCK_M + tl.arange(0, BLOCK_M)
    offs_n = tl.arange(0, BLOCK_N)
    offs_d = tl.arange(0, BLOCK_HEADDIM)
    # Initialize pointers to Q, K, V
    # Adding parenthesis around indexing might use int32 math instead of int64 math?
    # https://github.com/openai/triton/issues/741
    # I'm seeing a tiny bit of difference (5-7us)
    q_ptrs = Q + off_b * stride_qb + off_h * stride_qh + (
        offs_m[:, None] * stride_qm + offs_d[None, :])
    k_ptrs = K + off_b * stride_kb + off_h * stride_kh + (
        offs_n[:, None] * stride_kn + offs_d[None, :])
    v_ptrs = V + off_b * stride_vb + off_h * stride_vh + (
        offs_n[:, None] * stride_vn + offs_d[None, :])
    if BIAS_TYPE == 'vector':
        b_ptrs = Bias + off_b * stride_bb + off_h * stride_bh + offs_n
    elif BIAS_TYPE == 'matrix':
        b_ptrs = Bias + off_b * stride_bb + off_h * stride_bh + (
            offs_m[:, None] * stride_bm + offs_n[None, :])
    else:
        raise ValueError("BIAS_TYPE must be one of {'vector', 'matrix'}")
    # initialize pointer to m and l
    t_ptrs = TMP + off_hb * seqlen_q_rounded + offs_m
    lse_i = tl.zeros([BLOCK_M], dtype=tl.float32) - float('inf')
    m_i = tl.zeros([BLOCK_M], dtype=tl.float32) - float('inf')
    acc_o = tl.zeros([BLOCK_M, BLOCK_HEADDIM], dtype=tl.float32)
    # load q: it will stay in SRAM throughout
    # [2022-10-30] TD: Triton bug - in the case of EVEN_M=True and EVEN_N=False, if we just call
    # tl.load(q_ptrs), we get the wrong output!
    if EVEN_M & EVEN_N:
        if EVEN_HEADDIM:
            q = tl.load(q_ptrs)
        else:
            q = tl.load(q_ptrs, mask=offs_d[None, :] < headdim, other=0.0)
    else:
        if EVEN_HEADDIM:
            q = tl.load(q_ptrs, mask=offs_m[:, None] < seqlen_q, other=0.0)
        else:
            q = tl.load(q_ptrs,
                        mask=(offs_m[:, None] < seqlen_q) &
                        (offs_d[None, :] < headdim),
                        other=0.0)
    # loop over k, v and update accumulator
    end_n = seqlen_k if not IS_CAUSAL else tl.minimum(
        (start_m + 1) * BLOCK_M, seqlen_k)
    for start_n in range(0, end_n, BLOCK_N):
        start_n = tl.multiple_of(start_n, BLOCK_N)
        # -- compute qk ----
        if EVEN_N & EVEN_M:  # If we just do "if EVEN_N", there seems to be some race condition
            if EVEN_HEADDIM:
                k = tl.load(k_ptrs + start_n * stride_kn)
            else:
                k = tl.load(k_ptrs + start_n * stride_kn,
                            mask=offs_d[None, :] < headdim,
                            other=0.0)
        else:
            if EVEN_HEADDIM:
                k = tl.load(k_ptrs + start_n * stride_kn,
                            mask=(start_n + offs_n)[:, None] < seqlen_k,
                            other=0.0)
            else:
                k = tl.load(k_ptrs + start_n * stride_kn,
                            mask=((start_n + offs_n)[:, None] < seqlen_k) &
                            (offs_d[None, :] < headdim),
                            other=0.0)
        qk = tl.zeros([BLOCK_M, BLOCK_N], dtype=tl.float32)
        qk += tl.dot(q, k, trans_b=True)
                        ^
>>>

Pre-training code?

Hello, I was just wondering if you know when you will make the pre-training code available on your github? and, is it very different or similar to the pretraining code you have provided for DNABERT2?

Thank you for any assistance.
LeAnn

Multi-GPU support

Would the authors consider adding an implementation of _no_split_modules to offer multi-GPU options in Lightning for those wanting to use the embeddings?

Despite multiple trials and examining the model configuration, it seems that the model hosted on Hugging Face (`huggingface.co`) cannot handle sequences that exceed a length of 512 tokens. I've provided the relevant code below for clarity

Part 1: Tokenization and Dataset Preparation

from transformers import AutoTokenizer, BertForSequenceClassification
from torch.utils.data import Dataset

tokenizer = AutoTokenizer.from_pretrained("zhihan1996/DNABERT-2-117M", trust_remote_code=True)
model = BertForSequenceClassification.from_pretrained("zhihan1996/DNABERT-2-117M", num_labels=8)

class DNADataset(Dataset):
    def __init__(self, data, tokenizer):
        self.data = data
        self.tokenizer = tokenizer

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        seq, label = self.data[idx]
        inputs = self.tokenizer(seq, return_tensors='pt', padding='max_length', max_length=600, truncation=True)
        return {
            'input_ids': inputs["input_ids"].squeeze(),
            'label': label
        }

Part 2: Retrieving Model Configuration

from transformers import AutoConfig

config = AutoConfig.from_pretrained("zhihan1996/DNABERT-2-117M", trust_remote_code=True)
print(config.max_position_embeddings)

Sequences longer than 512 bases, and evaluation examples on our own dataset

Firstly, thank you so much for this implementation - this is really useful!

Is there still an input length constraint in the pretrained model, though? I noticed that when I feed in a sequence that generates more than 512 tokens (which was the original maximum BERT input sequence length) the model fails to generalize. Is this expected behaviour? Error given below -

If yes, then would you have any recommendations for dealing with sequences that generate more than 512 tokens?

Cell In[73], line 1
----> 1 dnabert(input_ids, attention_mask, token_type_ids)

File ~/predictor/venv/lib/python3.10/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
   1496 # If we don't have any hooks, we want to skip the rest of the logic in
   1497 # this function, and just call forward.
   1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1499         or _global_backward_pre_hooks or _global_backward_hooks
   1500         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501     return forward_call(*args, **kwargs)
   1502 # Do not call functions when jit is used
   1503 full_backward_hooks, non_full_backward_hooks = [], []

File ~/predictor/venv/lib/python3.10/site-packages/transformers/models/bert/modeling_bert.py:1015, in BertModel.forward(self, input_ids, attention_mask, token_type_ids, position_ids, head_mask, inputs_embeds, encoder_hidden_states, encoder_attention_mask, past_key_values, use_cache, output_attentions, output_hidden_states, return_dict)
   1008 # Prepare head mask if needed
   1009 # 1.0 in head_mask indicate we keep the head
   1010 # attention_probs has shape bsz x n_heads x N x N
   1011 # input head_mask has shape [num_heads] or [num_hidden_layers x num_heads]
   1012 # and head_mask is converted to shape [num_hidden_layers x batch x num_heads x seq_length x seq_length]
   1013 head_mask = self.get_head_mask(head_mask, self.config.num_hidden_layers)
-> 1015 embedding_output = self.embeddings(
   1016     input_ids=input_ids,
   1017     position_ids=position_ids,
   1018     token_type_ids=token_type_ids,
   1019     inputs_embeds=inputs_embeds,
   1020     past_key_values_length=past_key_values_length,
   1021 )
   1022 encoder_outputs = self.encoder(
   1023     embedding_output,
   1024     attention_mask=extended_attention_mask,
   (...)
   1032     return_dict=return_dict,
   1033 )
   1034 sequence_output = encoder_outputs[0]

File ~/predictor/venv/lib/python3.10/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
   1496 # If we don't have any hooks, we want to skip the rest of the logic in
   1497 # this function, and just call forward.
   1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1499         or _global_backward_pre_hooks or _global_backward_hooks
   1500         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501     return forward_call(*args, **kwargs)
   1502 # Do not call functions when jit is used
   1503 full_backward_hooks, non_full_backward_hooks = [], []

File ~/predictor/venv/lib/python3.10/site-packages/transformers/models/bert/modeling_bert.py:238, in BertEmbeddings.forward(self, input_ids, token_type_ids, position_ids, inputs_embeds, past_key_values_length)
    236 if self.position_embedding_type == "absolute":
    237     position_embeddings = self.position_embeddings(position_ids)
--> 238     embeddings += position_embeddings
    239 embeddings = self.LayerNorm(embeddings)
    240 embeddings = self.dropout(embeddings)

RuntimeError: The size of tensor a (514) must match the size of tensor b (512) at non-singleton dimension 1

Also, would you have any examples of finetuning on our own datasets/a regression dataset coming up soon?

Get logits and LM loss from the DNABert

Hi, I tried to get logits of a input sequence using DNABert, here is my code:

tokenizer = AutoTokenizer.from_pretrained("zhihan1996/DNABERT-2-117M", trust_remote_code=True)
model = AutoModel.from_pretrained("zhihan1996/DNABERT-2-117M", trust_remote_code=True)
model.eval().cuda()

seq = "TCCCACTATTTGTCGGCTAGCCAGATTGTTGTGGTCTGATTAAAGTT\
TCAATTTATACCTTACAATGATGTAAGGTACGTGTAAGAGAAATCGATGGGATA\
TTTTTTTACAACAAGGTATTCTTAAAGTAAGAGTTATACGCTATGTGGAAAAGAGGTGTTTAAG"

tokens_ids = tokenizer.batch_encode_plus([seq], return_tensors="pt")["input_ids"]
attention_mask = tokens_ids != tokenizer.pad_token_id

torch_outs = model(tokens_ids.cuda(),
                   attention_mask=attention_mask.cuda(),
                   encoder_attention_mask=attention_mask.cuda(),
                   output_hidden_states=True,
                   labels=tokens_ids.cuda())

The torch_outs returned is a tuple having 2 tensors: torch.Size([1, 39, 768]) and torch.Size([1, 768]). I assumed the first one are embeddings for each token, and the second is a pooled embeddings of this sequence?

Is that possible for the DNABert to return logits of each token?

ALiBi not working

Hi, I tried following the README with

from transformers import AutoModel, AutoTokenizer

# Load the model and tokenizer
model = AutoModel.from_pretrained("zhihan1996/DNABERT-2-117M")
tokenizer = AutoTokenizer.from_pretrained("zhihan1996/DNABERT-2-117M")

This does not work correctly with ALiBi, as it doesn't use the model class defined at https://huggingface.co/zhihan1996/DNABERT-2-117M/blob/main/bert_layers.py
Rather, it uses the default Huggingface BERT class, which, if I understand it correctly, does not have the ALiBI utils such as rebuild_alibi_tensor.

I encountered the error by trying to run a sequence longer than 512. Should the model be loaded another way to reproduce it correctly?

Downstream Association Study

Hi Zhihan,

I am wondering what do you think about the downstream association study to identify phenotype associated variants/regions in Genome Foundation Model. Do you think current model is suitable for such task or not? if not, is there any comments on how to develop a good model for such task.

Thanks

Shicheng

Issue regarding the config_class

ValueError Traceback (most recent call last)
in <cell line: 1>()
----> 1 model = AutoModel.from_pretrained("zhihan1996/DNABERT-2-117M", trust_remote_code=True)

1 frames
/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py in register(cls, config_class, model_class, exist_ok)
534 """
535 if hasattr(model_class, "config_class") and model_class.config_class != config_class:
--> 536 raise ValueError(
537 "The model class you are passing has a config_class attribute that is not consistent with the "
538 f"config class you passed (model has {model_class.config_class} and you passed {config_class}. Fix "

ValueError: The model class you are passing has a config_class attribute that is not consistent with the config class you passed (model has <class 'transformers.models.bert.configuration_bert.BertConfig'> and you passed <class 'transformers_modules.zhihan1996.DNABERT-2-117M.81ac6a98387cf94bc283553260f3fa6b88cef2fa.configuration_bert.BertConfig'>. Fix one of those so they match!

Can you please help me out

ValueError

Hi, I tried to first time run the finetunning with train.py. I had an error at the following lines:
parser = transformers.HfArgumentParser((ModelArguments, DataArguments, TrainingArguments))
model_args, data_args, training_args = parser.parse_args_into_dataclasses()

Below is the error message I got:

  ValueError: (Field(name=None,type=None,default='steps',default_factory=<dataclasses._MISSING_TYPE object at 0x0000016635086850>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=None),) is not a valid IntervalStrategy
  
  During handling of the above exception, another exception occurred:
  
  Traceback (most recent call last):
    File "C:\ProgramData\MiniConda\envs\dnabert\lib\enum.py", line 670, in __new__
      raise exc
    File "C:\ProgramData\MiniConda\envs\dnabert\lib\enum.py", line 653, in __new__
      result = cls._missing_(value)
    File "C:\ProgramData\MiniConda\envs\dnabert\lib\site-packages\transformers\utils\generic.py", line 348, in _missing_
      raise ValueError(
  ValueError: (Field(name=None,type=None,default='steps',default_factory=<dataclasses._MISSING_TYPE object at 0x0000016635086850>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=None),) is not a valid IntervalStrategy, please select one of ['no', 'steps', 'epoch']

Any idea to fix that?
Thank you!

StopIteration: Caught StopIteration in replica 0 on device 0.

Hello,

I have come across the following error when trying to run this model on Bridges2 using 8 GPU. I have set up a fresh conda environment as detailed in the README.md.

StopIteration: Caught StopIteration in replica 0 on device 0.
Original Traceback (most recent call last):
  File "/jet/home/ahabib/.conda/envs/dna/lib/python3.8/site-packages/torch/nn/parallel/parallel_apply.py", line 64, in _worker
    output = module(*input, **kwargs)
  File "/jet/home/ahabib/.conda/envs/dna/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/jet/home/ahabib/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/5fd206e1a13cee3ef4a608677312175eb6f8143d/bert_layers.py", line 862, in forward
    outputs = self.bert(
  File "/jet/home/ahabib/.conda/envs/dna/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/jet/home/ahabib/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/5fd206e1a13cee3ef4a608677312175eb6f8143d/bert_layers.py", line 608, in forward
    encoder_outputs = self.encoder(
  File "/jet/home/ahabib/.conda/envs/dna/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/jet/home/ahabib/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/5fd206e1a13cee3ef4a608677312175eb6f8143d/bert_layers.py", line 416, in forward
    dtype=next(self.parameters()).dtype)  # fp16 compatibility
StopIteration

  0%|          | 0/5721 [00:10<?, ?it/s]++ date '+%Y-%m-%d %T'

I have attached my full output and batch script below! Thank you!

output.txt
train.txt

Error while loading the model

Thanks for the implementation of DNABERT, I had a small issue, could you please fix it?

model = AutoModel.from_pretrained("zhihan1996/DNABERT-2-117M", trust_remote_code=True) when I run this I am getting the following error. Any idea how to fix it??

ValueError: The model class you are passing has a config_class attribute that is not consistent with the config class you passed (model has and you passed . Fix one of those so they match!

Pre-training Data

In the paper you state, "In order to facilitate further research on large-scale genome foundational models, we have collated and made available multi-species genome datasets for both pre-training of models (Sec. 4.1) and benchmarking (Sec. 4.2)."

but I cannot see where these datasets are, I have looked both on Huggingface and your github?

Have I overlooked them somewhere?

GUE: Splice site recognition in regards to sequence labeling/NER vs classification

Hi Zhihan,

Hope you're doing well. I had a wonderful time reading your paper in regards to the improvements made to DNABERT and trying to set up a language model evaluation framework.

However, I question whether the labels for the splice site recognition benchmark truly represents how a user may approach this task. When going through the train/dev/test set, I've noticed that your labels only provide an annotation on whether the entire sequence is {no splice site, donor, acceptor}. Current state of the art tools, such as SpliceAI, provide even further information similar to an NER task by labeling every nucleotide within a given sequence. Therefore, has DNABERT2 been previously evaluated in evaluating Splice sites as an NER task?

CompilationError: at 114:24:

Epoch [1/3]

KeyError Traceback (most recent call last)
File :21, in _fwd_kernel(Q, K, V, Bias, Out, Lse, TMP, softmax_scale, stride_qb, stride_qh, stride_qm, stride_kb, stride_kh, stride_kn, stride_vb, stride_vh, stride_vn, stride_bb, stride_bh, stride_bm, stride_ob, stride_oh, stride_om, nheads, seqlen_q, seqlen_k, seqlen_q_rounded, headdim, CACHE_KEY_SEQLEN_Q, CACHE_KEY_SEQLEN_K, BIAS_TYPE, IS_CAUSAL, BLOCK_HEADDIM, EVEN_M, EVEN_N, EVEN_HEADDIM, BLOCK_M, BLOCK_N, grid, num_warps, num_stages, extern_libs, stream, warmup)

KeyError: ('2-.-0-.-0--d6252949da17ceb5f3a278a70250af13-3b85c7bef5f0a641282f3b73af50f599-14de7de5c4da5794c8ca14e7e41a122d-3498c340fd4b6ee7805fd54b882a04f5-e1f133f98d04093da2078dfc51c36b72-b26258bf01f839199e39d64851821f26-d7c06e3b46e708006c15224aac7a1378-f585402118c8a136948ce0a49cfe122c', (torch.float16, torch.float16, torch.float16, torch.float16, torch.float16, torch.float32, torch.float32, 'fp32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32'), ('matrix', False, 64, False, False, True, 128, 128), (True, True, True, True, True, True, True, (False,), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (False, False), (False, False), (False, False), (True, False), (True, False), (True, False), (False, False), (False, False), (False, False), (True, False), (True, False), (False, False), (False, False)))

During handling of the above exception, another exception occurred:

TypeError Traceback (most recent call last)
File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/triton/compiler.py:937, in build_triton_ir(fn, signature, specialization, constants)
936 try:
--> 937 generator.visit(fn.parse())
938 except Exception as e:

File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/triton/compiler.py:855, in CodeGenerator.visit(self, node)
854 warnings.simplefilter("ignore", PendingDeprecationWarning) # python 3.8
--> 855 return super().visit(node)

File ~/anaconda3/envs/pytorch_python38/lib/python3.8/ast.py:371, in NodeVisitor.visit(self, node)
370 visitor = getattr(self, method, self.generic_visit)
--> 371 return visitor(node)

File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/triton/compiler.py:183, in CodeGenerator.visit_Module(self, node)
182 def visit_Module(self, node):
--> 183 ast.NodeVisitor.generic_visit(self, node)

File ~/anaconda3/envs/pytorch_python38/lib/python3.8/ast.py:379, in NodeVisitor.generic_visit(self, node)
378 if isinstance(item, AST):
--> 379 self.visit(item)
380 elif isinstance(value, AST):

File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/triton/compiler.py:855, in CodeGenerator.visit(self, node)
854 warnings.simplefilter("ignore", PendingDeprecationWarning) # python 3.8
--> 855 return super().visit(node)

File ~/anaconda3/envs/pytorch_python38/lib/python3.8/ast.py:371, in NodeVisitor.visit(self, node)
370 visitor = getattr(self, method, self.generic_visit)
--> 371 return visitor(node)

File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/triton/compiler.py:252, in CodeGenerator.visit_FunctionDef(self, node)
251 # visit function body
--> 252 has_ret = self.visit_compound_statement(node.body)
253 # finalize function

File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/triton/compiler.py:177, in CodeGenerator.visit_compound_statement(self, stmts)
176 for stmt in stmts:
--> 177 self.last_ret_type = self.visit(stmt)
178 if isinstance(stmt, ast.Return):

File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/triton/compiler.py:855, in CodeGenerator.visit(self, node)
854 warnings.simplefilter("ignore", PendingDeprecationWarning) # python 3.8
--> 855 return super().visit(node)

File ~/anaconda3/envs/pytorch_python38/lib/python3.8/ast.py:371, in NodeVisitor.visit(self, node)
370 visitor = getattr(self, method, self.generic_visit)
--> 371 return visitor(node)

File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/triton/compiler.py:678, in CodeGenerator.visit_For(self, node)
677 self.scf_stack.append(node)
--> 678 self.visit_compound_statement(node.body)
679 self.scf_stack.pop()

File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/triton/compiler.py:177, in CodeGenerator.visit_compound_statement(self, stmts)
176 for stmt in stmts:
--> 177 self.last_ret_type = self.visit(stmt)
178 if isinstance(stmt, ast.Return):

File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/triton/compiler.py:855, in CodeGenerator.visit(self, node)
854 warnings.simplefilter("ignore", PendingDeprecationWarning) # python 3.8
--> 855 return super().visit(node)

File ~/anaconda3/envs/pytorch_python38/lib/python3.8/ast.py:371, in NodeVisitor.visit(self, node)
370 visitor = getattr(self, method, self.generic_visit)
--> 371 return visitor(node)

File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/triton/compiler.py:319, in CodeGenerator.visit_AugAssign(self, node)
318 assign = ast.Assign(targets=[node.target], value=rhs)
--> 319 self.visit(assign)
320 return self.get_value(name)

File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/triton/compiler.py:855, in CodeGenerator.visit(self, node)
854 warnings.simplefilter("ignore", PendingDeprecationWarning) # python 3.8
--> 855 return super().visit(node)

File ~/anaconda3/envs/pytorch_python38/lib/python3.8/ast.py:371, in NodeVisitor.visit(self, node)
370 visitor = getattr(self, method, self.generic_visit)
--> 371 return visitor(node)

File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/triton/compiler.py:301, in CodeGenerator.visit_Assign(self, node)
300 names = _names[0]
--> 301 values = self.visit(node.value)
302 if not isinstance(names, tuple):

File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/triton/compiler.py:855, in CodeGenerator.visit(self, node)
854 warnings.simplefilter("ignore", PendingDeprecationWarning) # python 3.8
--> 855 return super().visit(node)

File ~/anaconda3/envs/pytorch_python38/lib/python3.8/ast.py:371, in NodeVisitor.visit(self, node)
370 visitor = getattr(self, method, self.generic_visit)
--> 371 return visitor(node)

File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/triton/compiler.py:339, in CodeGenerator.visit_BinOp(self, node)
338 lhs = self.visit(node.left)
--> 339 rhs = self.visit(node.right)
340 fn = {
341 ast.Add: 'add',
342 ast.Sub: 'sub',
(...)
352 ast.BitXor: 'xor',
353 }[type(node.op)]

File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/triton/compiler.py:855, in CodeGenerator.visit(self, node)
854 warnings.simplefilter("ignore", PendingDeprecationWarning) # python 3.8
--> 855 return super().visit(node)

File ~/anaconda3/envs/pytorch_python38/lib/python3.8/ast.py:371, in NodeVisitor.visit(self, node)
370 visitor = getattr(self, method, self.generic_visit)
--> 371 return visitor(node)

File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/triton/compiler.py:797, in CodeGenerator.visit_Call(self, node)
795 if (hasattr(fn, 'self') and self.is_triton_tensor(fn.self))
796 or impl.is_builtin(fn):
--> 797 return fn(*args, _builder=self.builder, **kws)
798 if fn in self.builtins.values():

File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/triton/impl/base.py:22, in builtin..wrapper(*args, **kwargs)
18 raise ValueError(
19 "Did you forget to add @triton.jit ? "
20 "(_builder argument must be provided outside of JIT functions.)"
21 )
---> 22 return fn(*args, **kwargs)

TypeError: dot() got an unexpected keyword argument 'trans_b'

The above exception was the direct cause of the following exception:

CompilationError Traceback (most recent call last)
Cell In[15], line 1
----> 1 teacher_train(T_model, cfg, train_loader, test_loader)

Cell In[14], line 39, in teacher_train(model, config, train_loader, test_loader)
37 mask = mask.to(config.device)
38 labels = labels.to(config.device)
---> 39 outputs = model(ids, mask)
40 model.zero_grad()
41 loss = F.cross_entropy(outputs, labels)

File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
1496 # If we don't have any hooks, we want to skip the rest of the logic in
1497 # this function, and just call forward.
1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = [], []

Cell In[12], line 12, in BERT_Model.forward(self, context, mask)
11 def forward(self, context, mask):
---> 12 outputs = self.bert(context, attention_mask=mask)
13 pooled = outputs[1]
14 out = self.fc(pooled)

File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
1496 # If we don't have any hooks, we want to skip the rest of the logic in
1497 # this function, and just call forward.
1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = [], []

File ~/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/5fd206e1a13cee3ef4a608677312175eb6f8143d/bert_layers.py:608, in BertModel.forward(self, input_ids, token_type_ids, attention_mask, position_ids, output_all_encoded_layers, masked_tokens_mask, **kwargs)
605 first_col_mask[:, 0] = True
606 subset_mask = masked_tokens_mask | first_col_mask
--> 608 encoder_outputs = self.encoder(
609 embedding_output,
610 attention_mask,
611 output_all_encoded_layers=output_all_encoded_layers,
612 subset_mask=subset_mask)
614 if masked_tokens_mask is None:
615 sequence_output = encoder_outputs[-1]

File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
1496 # If we don't have any hooks, we want to skip the rest of the logic in
1497 # this function, and just call forward.
1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = [], []

File ~/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/5fd206e1a13cee3ef4a608677312175eb6f8143d/bert_layers.py:446, in BertEncoder.forward(self, hidden_states, attention_mask, output_all_encoded_layers, subset_mask)
444 if subset_mask is None:
445 for layer_module in self.layer:
--> 446 hidden_states = layer_module(hidden_states,
447 cu_seqlens,
448 seqlen,
449 None,
450 indices,
451 attn_mask=attention_mask,
452 bias=alibi_attn_mask)
453 if output_all_encoded_layers:
454 all_encoder_layers.append(hidden_states)

File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
1496 # If we don't have any hooks, we want to skip the rest of the logic in
1497 # this function, and just call forward.
1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = [], []

File ~/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/5fd206e1a13cee3ef4a608677312175eb6f8143d/bert_layers.py:327, in BertLayer.forward(self, hidden_states, cu_seqlens, seqlen, subset_idx, indices, attn_mask, bias)
305 def forward(
306 self,
307 hidden_states: torch.Tensor,
(...)
313 bias: Optional[torch.Tensor] = None,
314 ) -> torch.Tensor:
315 """Forward pass for a BERT layer, including both attention and MLP.
316
317 Args:
(...)
325 bias: None or (batch, heads, max_seqlen_in_batch, max_seqlen_in_batch)
326 """
--> 327 attention_output = self.attention(hidden_states, cu_seqlens, seqlen,
328 subset_idx, indices, attn_mask, bias)
329 layer_output = self.mlp(attention_output)
330 return layer_output

File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
1496 # If we don't have any hooks, we want to skip the rest of the logic in
1497 # this function, and just call forward.
1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = [], []

File ~/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/5fd206e1a13cee3ef4a608677312175eb6f8143d/bert_layers.py:240, in BertUnpadAttention.forward(self, input_tensor, cu_seqlens, max_s, subset_idx, indices, attn_mask, bias)
218 def forward(
219 self,
220 input_tensor: torch.Tensor,
(...)
226 bias: Optional[torch.Tensor] = None,
227 ) -> torch.Tensor:
228 """Forward pass for scaled self-attention without padding.
229
230 Arguments:
(...)
238 bias: None or (batch, heads, max_seqlen_in_batch, max_seqlen_in_batch)
239 """
--> 240 self_output = self.self(input_tensor, cu_seqlens, max_s, indices,
241 attn_mask, bias)
242 if subset_idx is not None:
243 return self.output(index_first_axis(self_output, subset_idx),
244 index_first_axis(input_tensor, subset_idx))

File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
1496 # If we don't have any hooks, we want to skip the rest of the logic in
1497 # this function, and just call forward.
1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = [], []

File ~/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/5fd206e1a13cee3ef4a608677312175eb6f8143d/bert_layers.py:181, in BertUnpadSelfAttention.forward(self, hidden_states, cu_seqlens, max_seqlen_in_batch, indices, attn_mask, bias)
179 bias_dtype = bias.dtype
180 bias = bias.to(torch.float16)
--> 181 attention = flash_attn_qkvpacked_func(qkv, bias)
182 attention = attention.to(orig_dtype)
183 bias = bias.to(bias_dtype)

File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/torch/autograd/function.py:506, in Function.apply(cls, *args, **kwargs)
503 if not torch._C._are_functorch_transforms_active():
504 # See NOTE: [functorch vjp and autograd interaction]
505 args = _functorch.utils.unwrap_dead_wrappers(args)
--> 506 return super().apply(*args, **kwargs) # type: ignore[misc]
508 if cls.setup_context == _SingleLevelFunction.setup_context:
509 raise RuntimeError(
510 'In order to use an autograd.Function with functorch transforms '
511 '(vmap, grad, jvp, jacrev, ...), it must override the setup_context '
512 'staticmethod. For more details, please see '
513 'https://pytorch.org/docs/master/notes/extending.func.html')

File ~/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/5fd206e1a13cee3ef4a608677312175eb6f8143d/flash_attn_triton.py:1021, in _FlashAttnQKVPackedFunc.forward(ctx, qkv, bias, causal, softmax_scale)
1019 if qkv.stride(-1) != 1:
1020 qkv = qkv.contiguous()
-> 1021 o, lse, ctx.softmax_scale = _flash_attn_forward(
1022 qkv[:, :, 0],
1023 qkv[:, :, 1],
1024 qkv[:, :, 2],
1025 bias=bias,
1026 causal=causal,
1027 softmax_scale=softmax_scale)
1028 ctx.save_for_backward(qkv, o, lse, bias)
1029 ctx.causal = causal

File ~/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/5fd206e1a13cee3ef4a608677312175eb6f8143d/flash_attn_triton.py:826, in _flash_attn_forward(q, k, v, bias, causal, softmax_scale)
823 # BLOCK = 128
824 # num_warps = 4 if d <= 64 else 8
825 grid = lambda META: (triton.cdiv(seqlen_q, META['BLOCK_M']), batch * nheads)
--> 826 _fwd_kernel[grid]( # type: ignore
827 q,
828 k,
829 v,
830 bias,
831 o,
832 lse,
833 tmp,
834 softmax_scale,
835 q.stride(0),
836 q.stride(2),
837 q.stride(1),
838 k.stride(0),
839 k.stride(2),
840 k.stride(1),
841 v.stride(0),
842 v.stride(2),
843 v.stride(1),
844 *bias_strides,
845 o.stride(0),
846 o.stride(2),
847 o.stride(1),
848 nheads,
849 seqlen_q,
850 seqlen_k,
851 seqlen_q_rounded,
852 d,
853 seqlen_q // 32,
854 seqlen_k // 32, # key for triton cache (limit number of compilations)
855 # Can't use kwargs here because triton autotune expects key to be args, not kwargs
856 # IS_CAUSAL=causal, BLOCK_HEADDIM=d,
857 bias_type,
858 causal,
859 BLOCK_HEADDIM,
860 # BLOCK_M=BLOCK, BLOCK_N=BLOCK,
861 # num_warps=num_warps,
862 # num_stages=1,
863 )
864 return o, lse, softmax_scale

File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/triton/runtime/autotuner.py:90, in Autotuner.run(self, *args, **kwargs)
88 if config.pre_hook is not None:
89 config.pre_hook(self.nargs)
---> 90 return self.fn.run(*args, num_warps=config.num_warps, num_stages=config.num_stages, **kwargs, **config.kwargs)

File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/triton/runtime/autotuner.py:199, in Heuristics.run(self, *args, **kwargs)
197 for v, heur in self.values.items():
198 kwargs[v] = heur({**dict(zip(self.arg_names, args)), **kwargs})
--> 199 return self.fn.run(*args, **kwargs)

File :41, in _fwd_kernel(Q, K, V, Bias, Out, Lse, TMP, softmax_scale, stride_qb, stride_qh, stride_qm, stride_kb, stride_kh, stride_kn, stride_vb, stride_vh, stride_vn, stride_bb, stride_bh, stride_bm, stride_ob, stride_oh, stride_om, nheads, seqlen_q, seqlen_k, seqlen_q_rounded, headdim, CACHE_KEY_SEQLEN_Q, CACHE_KEY_SEQLEN_K, BIAS_TYPE, IS_CAUSAL, BLOCK_HEADDIM, EVEN_M, EVEN_N, EVEN_HEADDIM, BLOCK_M, BLOCK_N, grid, num_warps, num_stages, extern_libs, stream, warmup)

File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/triton/compiler.py:1621, in compile(fn, **kwargs)
1619 next_module = parse(path)
1620 else:
-> 1621 next_module = compile(module)
1622 fn_cache_manager.put(next_module, f"{name}.{ir}")
1623 if os.path.exists(path):

File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/triton/compiler.py:1550, in compile..(src)
1545 extern_libs = kwargs.get("extern_libs", dict())
1546 # build compilation stages
1547 stages = {
1548 "ast": (lambda path: fn, None),
1549 "ttir": (lambda path: parse_mlir_module(path, context),
-> 1550 lambda src: ast_to_ttir(src, signature, configs[0], constants)),
1551 "ttgir": (lambda path: parse_mlir_module(path, context),
1552 lambda src: ttir_to_ttgir(src, num_warps, num_stages, capability)),
1553 "llir": (lambda path: Path(path).read_text(),
1554 lambda src: ttgir_to_llir(src, extern_libs, capability)),
1555 "ptx": (lambda path: Path(path).read_text(),
1556 lambda src: llir_to_ptx(src, capability)),
1557 "cubin": (lambda path: Path(path).read_bytes(),
1558 lambda src: ptx_to_cubin(src, capability))
1559 }
1560 # find out the signature of the function
1561 if isinstance(fn, triton.runtime.JITFunction):

File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/triton/compiler.py:962, in ast_to_ttir(fn, signature, specialization, constants)
961 def ast_to_ttir(fn, signature, specialization, constants):
--> 962 mod, _ = build_triton_ir(fn, signature, specialization, constants)
963 return optimize_triton_ir(mod)

File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/triton/compiler.py:942, in build_triton_ir(fn, signature, specialization, constants)
940 if node is None or isinstance(e, (NotImplementedError, CompilationError)):
941 raise e
--> 942 raise CompilationError(fn.src, node) from e
943 ret = generator.module
944 # module takes ownership of the context

Train Dev Test Splits for GUE Benchmark Differ from Reported Paper Splits

Hi Zhihan,

Thank you very much for your hard work on this repo, much appreciated!

I have been trying to recreate the results of the DNABERT2 on the COVID variant prediction task from the GUE benchmark and I have noticed a discrepancy between the reported values in the paper for the train, validation, and test splits.

In the paper, it is reported that the breakdown for these splits is: 77669 / 7000 / 7000.
However, using the files provided here, I get a breakdown for the splits as: 73335 / 9168 / 9168.

I haven't checked any of the other benchmark datasets but this may be an issue with others too.

Is it possible for you to fix the datasets provided to the correct splits given in the paper? This will allow others to recreate and validate your results.

install triton

why not install triton by running pip install triton ?
it seems that installing triton from source will meet some errors which are difficult to be solved.

TypeError: __init__() got an unexpected keyword argument 'approximate'

import torch
from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("zhihan1996/DNABERT-2-117M", trust_remote_code=True)
model = AutoModel.from_pretrained("zhihan1996/DNABERT-2-117M", trust_remote_code=True)

【hello, When I ran the above code, an error was reported as follows. Can you help me see what the reason is】


TypeError Traceback (most recent call last)
Cell In[6], line 5
2 from transformers import AutoTokenizer, AutoModel
4 tokenizer = AutoTokenizer.from_pretrained("zhihan1996/DNABERT-2-117M", trust_remote_code=True)
----> 5 model = AutoModel.from_pretrained("zhihan1996/DNABERT-2-117M", trust_remote_code=True)

File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/transformers/models/auto/auto_factory.py:479, in _BaseAutoModelClass.from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
475 model_class = get_class_from_dynamic_module(
476 class_ref, pretrained_model_name_or_path, **hub_kwargs, **kwargs
477 )
478 _ = hub_kwargs.pop("code_revision", None)
--> 479 return model_class.from_pretrained(
480 pretrained_model_name_or_path, *model_args, config=config, **hub_kwargs, **kwargs
481 )
482 elif type(config) in cls._model_mapping.keys():
483 model_class = _get_model_class(config, cls._model_mapping)

File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/transformers/modeling_utils.py:2675, in PreTrainedModel.from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
2672 init_contexts.append(init_empty_weights())
2674 with ContextManagers(init_contexts):
-> 2675 model = cls(config, *model_args, **model_kwargs)
2677 # Check first if we are from_pt
2678 if use_keep_in_fp32_modules:

File ~/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/5fd206e1a13cee3ef4a608677312175eb6f8143d/bert_layers.py:570, in BertModel.init(self, config, add_pooling_layer)
568 super(BertModel, self).init(config)
569 self.embeddings = BertEmbeddings(config)
--> 570 self.encoder = BertEncoder(config)
571 self.pooler = BertPooler(config) if add_pooling_layer else None
572 self.post_init()

File ~/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/5fd206e1a13cee3ef4a608677312175eb6f8143d/bert_layers.py:345, in BertEncoder.init(self, config)
343 def init(self, config):
344 super().init()
--> 345 layer = BertLayer(config)
346 self.layer = nn.ModuleList(
347 [copy.deepcopy(layer) for _ in range(config.num_hidden_layers)])
349 self.num_attention_heads = config.num_attention_heads

File ~/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/5fd206e1a13cee3ef4a608677312175eb6f8143d/bert_layers.py:303, in BertLayer.init(self, config)
301 super(BertLayer, self).init()
302 self.attention = BertUnpadAttention(config)
--> 303 self.mlp = BertGatedLinearUnitMLP(config)

File ~/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/5fd206e1a13cee3ef4a608677312175eb6f8143d/bert_layers.py:270, in BertGatedLinearUnitMLP.init(self, config)
266 self.config = config
267 self.gated_layers = nn.Linear(config.hidden_size,
268 config.intermediate_size * 2,
269 bias=False)
--> 270 self.act = nn.GELU(approximate='none')
271 self.wo = nn.Linear(config.intermediate_size, config.hidden_size)
272 self.dropout = nn.Dropout(config.hidden_dropout_prob)

TypeError: init() got an unexpected keyword argument 'approximate'

ValueError: setting an array element with a sequence.

Hello, I was testing the finetune code but I got this error:

Traceback (most recent call last):██████████████████████████████████████████████████████████████████████| 30/30 [00:00<00:00, 51.26it/s]
File "bin/DNABERT_2/finetune/train.py", line 286, in
train()
File "bin/DNABERT_2/finetune/train.py", line 268, in train
trainer.train()
File "/miniconda/envs/dna/lib/python3.8/site-packages/transformers/trainer.py", line 1645, in train
return inner_training_loop(
File "/miniconda/envs/dna/lib/python3.8/site-packages/transformers/trainer.py", line 2020, in _inner_training_loop
self._maybe_log_save_evaluate(tr_loss, model, trial, epoch, ignore_keys_for_eval)
File "/miniconda/envs/dna/lib/python3.8/site-packages/transformers/trainer.py", line 2321, in _maybe_log_save_evaluate
metrics = self.evaluate(ignore_keys=ignore_keys_for_eval)
File "/miniconda/envs/dna/lib/python3.8/site-packages/transformers/trainer.py", line 3053, in evaluate
output = eval_loop(
File "/miniconda/envs/dna/lib/python3.8/site-packages/transformers/trainer.py", line 3353, in evaluation_loop
metrics = self.compute_metrics(EvalPrediction(predictions=all_preds, label_ids=all_labels))
File "bin/DNABERT_2/finetune/train.py", line 204, in compute_metrics
return calculate_metric_with_sklearn(logits, labels)
File "bin/DNABERT_2/finetune/train.py", line 190, in calculate_metric_with_sklearn
predictions = np.argmax(logits, axis=-1)
File "<array_function internals>", line 200, in argmax
File "/miniconda/envs/dna/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 1242, in argmax
return _wrapfunc(a, 'argmax', axis=axis, out=out, **kwds)
File "/miniconda/envs/dna/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 54, in _wrapfunc
return _wrapit(obj, method, *args, **kwds)
File "/miniconda/envs/dna/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 43, in _wrapit
result = getattr(asarray(obj), method)(*args, **kwds)
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 2 dimensions. The detected shape was (2, 471) + inhomogeneous part.

Error when Loading DNABERT Model using `AutoModel`

Issue:

I am trying to load a pretrained DNABERT model from the model hub using the AutoModel and AutoTokenizer classes from the transformers library. However, I am encountering an error related to inconsistent config_class attributes.

Code to Reproduce:

from transformers import AutoModel, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("zhihan1996/DNABERT-2-117M", trust_remote_code=True)
model = AutoModel.from_pretrained("zhihan1996/DNABERT-2-117M", trust_remote_code=True)

Error Message:

ValueError: The model class you are passing has a `config_class` attribute that is not consistent with the config class you passed (model has <class 'transformers.models.bert.configuration_bert.BertConfig'> and you passed <class 'transformers_modules.zhihan1996.DNABERT-2-117M.81ac6a98387cf94bc283553260f3fa6b88cef2fa.configuration_bert.BertConfig'>. Fix one of those so they match!

Environment Details:

  • Python version: 3.10
  • OS: Windows11

human vs non-human data ratio

Dear Zhihan,

I am curious is there any evaluation to identify the best ratio for human vs non-human data? GENA-LM model included human T2T and 1000 Genome project to make human vs non-human equal to 1:1.

I am wondering how you think about this ratio when you develop this project/model.

Thanks.

Shicheng

No module named 'transformers_modules.zhihan1996.DNABERT-2-117M.bert_padding'

Hi,

When I run the code from the README.md:

# Load model directly
from transformers import AutoTokenizer, AutoModelForMaskedLM

tokenizer = AutoTokenizer.from_pretrained("zhihan1996/DNABERT-2-117M", trust_remote_code=True)
model = AutoModelForMaskedLM.from_pretrained("zhihan1996/DNABERT-2-117M", trust_remote_code=True)`

I recieve the error shown at the bottom of this post. I have created a fresh conda environment and installed the packages listed in the README.md.


---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
[/Users/colm/repos/dsb-gene-function-prediction/notebooks/exploratory/generate_embeddings_for_dna_sequences.ipynb](https://file+.vscode-resource.vscode-cdn.net/Users/colm/repos/dsb-gene-function-prediction/notebooks/exploratory/generate_embeddings_for_dna_sequences.ipynb) Cell 1 in 5
      [2](vscode-notebook-cell:/Users/colm/repos/dsb-gene-function-prediction/notebooks/exploratory/generate_embeddings_for_dna_sequences.ipynb#W0sZmlsZQ%3D%3D?line=1) from transformers import AutoTokenizer, AutoModelForMaskedLM
      [4](vscode-notebook-cell:/Users/colm/repos/dsb-gene-function-prediction/notebooks/exploratory/generate_embeddings_for_dna_sequences.ipynb#W0sZmlsZQ%3D%3D?line=3) tokenizer = AutoTokenizer.from_pretrained("zhihan1996/DNABERT-2-117M", trust_remote_code=True, revision="5fd206e")
----> [5](vscode-notebook-cell:/Users/colm/repos/dsb-gene-function-prediction/notebooks/exploratory/generate_embeddings_for_dna_sequences.ipynb#W0sZmlsZQ%3D%3D?line=4) model = AutoModelForMaskedLM.from_pretrained("zhihan1996/DNABERT-2-117M", trust_remote_code=True, revision="5fd206e")

File [~/anaconda3/envs/DSBPredict/lib/python3.8/site-packages/transformers/models/auto/auto_factory.py:430](https://file+.vscode-resource.vscode-cdn.net/Users/colm/repos/dsb-gene-function-prediction/notebooks/exploratory/~/anaconda3/envs/DSBPredict/lib/python3.8/site-packages/transformers/models/auto/auto_factory.py:430), in _BaseAutoModelClass.from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
    428     class_ref = config.auto_map[cls.__name__]
    429     module_file, class_name = class_ref.split(".")
--> 430     model_class = get_class_from_dynamic_module(
    431         pretrained_model_name_or_path, module_file + ".py", class_name, **kwargs
    432     )
    433     return model_class.from_pretrained(pretrained_model_name_or_path, *model_args, config=config, **kwargs)
    434 elif type(config) in cls._model_mapping.keys():

File [~/anaconda3/envs/DSBPredict/lib/python3.8/site-packages/transformers/models/auto/dynamic.py:231](https://file+.vscode-resource.vscode-cdn.net/Users/colm/repos/dsb-gene-function-prediction/notebooks/exploratory/~/anaconda3/envs/DSBPredict/lib/python3.8/site-packages/transformers/models/auto/dynamic.py:231), in get_class_from_dynamic_module(pretrained_model_name_or_path, module_file, class_name, cache_dir, force_download, resume_download, proxies, use_auth_token, revision, local_files_only, **kwargs)
    229 # And lastly we get the class inside our newly created module
    230 final_module = os.path.join(full_submodule, module_name.replace(".py", ""))
--> 231 return get_class_in_module(class_name, final_module)

File [~/anaconda3/envs/DSBPredict/lib/python3.8/site-packages/transformers/models/auto/dynamic.py:103](https://file+.vscode-resource.vscode-cdn.net/Users/colm/repos/dsb-gene-function-prediction/notebooks/exploratory/~/anaconda3/envs/DSBPredict/lib/python3.8/site-packages/transformers/models/auto/dynamic.py:103), in get_class_in_module(class_name, module_path)
     99 """
    100 Import a module on the cache directory for modules and extract a class from it.
    101 """
...
     25                                             unpad_input, unpad_input_only)
     27 try:
     28     from .flash_attn_triton import flash_attn_qkvpacked_func

ModuleNotFoundError: No module named 'transformers_modules.zhihan1996.DNABERT-2-117M.bert_padding'

[Feature request] incorporation of flash-attention 2

Thank you for your exceptional work!
Your model outperforms others with its efficiency in terms of parameters and speed. We have noticed that flash-attention has recently released version 2, which has greatly improved computation speed. We kindly request the incorporation of this update as it is urgently needed.
Once again, thank you for your hard work! :)

On the running results and usage issues of the model

Hi,

Thank you so much for DNABERT-2! There were some issues when I ran the model and used the results.

First, when I run the sample code and find hidden_ States. shape is torch. Size ([1, 17, 768]) not torch. Size ([1, len (DNA), 768])

import torch
from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("zhihan1996/DNABERT-2-117M", trust_remote_code=True)
model = AutoModel.from_pretrained("zhihan1996/DNABERT-2-117M", trust_remote_code=True)

dna = "ACGTAGCATCGGATCTATCTATCGACACTTGGTTATCGATCTACGAGCATCTCGTTAGC"
inputs = tokenizer(dna, return_tensors = 'pt')["input_ids"]
hidden_states = model(inputs)[0] # [1, sequence_length, 768]

# Question 1 : torch.Size([1, 17, 768]) not torch.Size([1, len(dna), 768])
print(hidden_states.shape)

Secondly, I have observed that the model (inputs) returns two tensors with sizes of torch. Size ([1, 17, 768]) and torch. Size ([1, 768]). I would like to ask what the second tensor means?

import torch
from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("zhihan1996/DNABERT-2-117M", trust_remote_code=True)
model = AutoModel.from_pretrained("zhihan1996/DNABERT-2-117M", trust_remote_code=True)

dna = "ACGTAGCATCGGATCTATCTATCGACACTTGGTTATCGATCTACGAGCATCTCGTTAGC"
inputs = tokenizer(dna, return_tensors='pt')["input_ids"]

result = model(inputs)

# output: 2
print(len(result))

# Question 2: What is the meaning of result[1]. shape
# output: torch.Size([1, 17, 768]) torch.Size([1, 768])
print(result[0].shape, result[1].shape)

Third, if multiple DNA sequences are inputted simultaneously and padding is set, how can I obtain the tensor encoding of the corresponding length for each sequence (instead of the length after padding)

import torch
from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("zhihan1996/DNABERT-2-117M", trust_remote_code=True)
model = AutoModel.from_pretrained("zhihan1996/DNABERT-2-117M", trust_remote_code=True)

dna = ["ACGTAGCATCGGATCTATTTAGC", "ACGTAGCATCGGATCTATCTATCGACACCTATCATCTCGTTAGC", "ACGTATTATCGATCTACGAGCATCTCGTTAGC"]
inputs = tokenizer(dna, return_tensors='pt', padding=True)["input_ids"]
hidden_states = model(inputs)[0]  # [1, sequence_length, 768]

# torch.Size([3, 13, 768])
print(hidden_states.shape)

For the third question, they did this when using ESM before. I want to obtain the same sequence_ representations.

import torch

# Load ESM-2 model
model, alphabet = torch.hub.load("facebookresearch/esm:main", "esm2_t33_650M_UR50D")
batch_converter = alphabet.get_batch_converter()
model.eval()  # disables dropout for deterministic results

# Prepare data (first 2 sequences from ESMStructuralSplitDataset superfamily / 4)
data = [
    ("protein1", "MKTVRQERLKSIVRILERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGYNIVATPRGYVLAGG"),
    ("protein2", "KALTARQQEVFDLIRDHISQTGMPPTRAEIAQRLGFRSPNAAEEHLKALARKGVIEIVSGASRGIRLLQEE"),
    ("protein2 with mask", "KALTARQQEVFDLIRD<mask>ISQTGMPPTRAEIAQRLGFRSPNAAEEHLKALARKGVIEIVSGASRGIRLLQEE"),
    ("protein3", "K A <mask> I S Q"),
]
batch_labels, batch_strs, batch_tokens = batch_converter(data)
batch_lens = (batch_tokens != alphabet.padding_idx).sum(1)

# Extract per-residue representations (on CPU)
with torch.no_grad():
    results = model(batch_tokens, repr_layers=[33], return_contacts=True)
token_representations = results["representations"][33]

# Output: torch.Size([4, 73, 1280])
print(token_representations.shape)

# Generate per-sequence representations via averaging
# NOTE: token 0 is always a beginning-of-sequence token, so the first residue is token 1.
sequence_representations = []
for i, tokens_len in enumerate(batch_lens):
    sequence_representation = token_representations[i, 1: tokens_len - 1].mean(0)

    # Output: torch.Size([1280])
    print(sequence_representation.shape)

    sequence_representations.append(sequence_representation)

Thanks !!!

RuntimeError: The expanded size of the tensor (673) must match the existing size (512) at non-singleton dimension 1. Target sizes: [8, 673]. Tensor sizes: [1, 512]

I tried to run the finetune, I put the sequences and labels to the csv files, just followed the format in the sample data. But when I ran this, error happened, I do not know why. Could anyone help me, thank you?
`
trainer.train()

File "/home/u7343217/.conda/envs/canberra_city/lib/python3.10/site-packages/transformers/trainer.py", line 1553, in train return inner_training_loop(

File "/home/u7343217/.conda/envs/canberra_city/lib/python3.10/site-packages/transformers/trainer.py", line 1835, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/home/u7343217/.conda/envs/canberra_city/lib/python3.10/site-packages/transformers/trainer.py", line 2679, in training_step
loss = self.compute_loss(model, inputs)
File "/home/u7343217/.conda/envs/canberra_city/lib/python3.10/site-packages/transformers/trainer.py", line 2704, in compute_loss
outputs = model(**inputs)
File "/home/u7343217/.conda/envs/canberra_city/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/u7343217/.conda/envs/canberra_city/lib/python3.10/site-packages/torch/nn/parallel/data_parallel.py", line 171, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/home/u7343217/.conda/envs/canberra_city/lib/python3.10/site-packages/torch/nn/parallel/data_parallel.py", line 181, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/home/u7343217/.conda/envs/canberra_city/lib/python3.10/site-packages/torch/nn/parallel/parallel_apply.py", line 89, in parallel_apply
output.reraise()
File "/home/u7343217/.conda/envs/canberra_city/lib/python3.10/site-packages/torch/_utils.py", line 644, in reraise
raise exception
RuntimeError: Caught RuntimeError in replica 0 on device 0.
Original Traceback (most recent call last):
File "/home/u7343217/.conda/envs/canberra_city/lib/python3.10/site-packages/torch/nn/parallel/parallel_apply.py", line 64, in _worker
output = module(*input, **kwargs)
File "/home/u7343217/.conda/envs/canberra_city/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/u7343217/.conda/envs/canberra_city/lib/python3.10/site-packages/transformers/models/bert/modeling_bert.py", line 1562, in forward
outputs = self.bert(
File "/home/u7343217/.conda/envs/canberra_city/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/u7343217/.conda/envs/canberra_city/lib/python3.10/site-packages/transformers/models/bert/modeling_bert.py", line 988, in forward
buffered_token_type_ids_expanded = buffered_token_type_ids.expand(batch_size, seq_length)
RuntimeError: The expanded size of the tensor (673) must match the existing size (512) at non-singleton dimension 1. Target sizes: [8, 673]. Tensor sizes: [1, 512] `

It seems that the dimension of the tensor do not match, but I did not change any code, The command I ran was, python train.py --model_name_or_path zhihan1996/DNABERT-2-117M --data_path splited/csv_files --kmer -1 --run_name DNABERT2_test1 --model_max_length 700 --per_device_train_batch_size 8 --per_device_eval_batch_size 16 --gradient_accumulation_steps 1 --learning_rate 3e-5 --num_train_epochs 3 --fp16 --save_steps 200 --output_dir output/dnabert2 --evaluation_strategy steps --eval_steps 200 --warmup_steps 50 --logging_steps 100000 --overwrite_output_dir True --log_level info --find_unused_parameters False. I set the max length hyperparameter as 700 because my sequences are long.

Facing issue while computing embeddings

I have used the code for computing embeddings. But it is showing the error.
dna = "ACGTAGCATCGGATCTATCTATCGACACTTGGTTATCGATCTACGAGCATCTCGTTAGC"
inputs = tokenizer(dna, return_tensors = 'pt')["input_ids"]
hidden_states = model(inputs)[0] # [1, sequence_length, 768]

embedding with mean pooling

embedding_mean = torch.mean(hidden_states[0], dim=0)
print(embedding_mean.shape) # expect to be 768

embedding with max pooling

embedding_max = torch.max(hidden_states[0], dim=0)[0]
print(embedding_max.shape) # expect to be 768

after running this piece of code, I got error:

AssertionError Traceback (most recent call last)
in <cell line: 3>()
1 dna = "ACGTAGCATCGGATCTATCTATCGACACTTGGTTATCGATCTACGAGCATCTCGTTAGC"
2 inputs = tokenizer(dna, return_tensors = 'pt')["input_ids"]
----> 3 hidden_states = model(inputs)[0] # [1, sequence_length, 768]
4
5 # embedding with mean pooling

12 frames
~/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/81ac6a98387cf94bc283553260f3fa6b88cef2fa/flash_attn_triton.py in _flash_attn_forward(q, k, v, bias, causal, softmax_scale)
779 assert q.dtype in [torch.float16,
780 torch.bfloat16], 'Only support fp16 and bf16'
--> 781 assert q.is_cuda and k.is_cuda and v.is_cuda
782 softmax_scale = softmax_scale or 1.0 / math.sqrt(d)
783

AssertionError:

Please let me know what should I do?

Cannot import "zhihan1996/DNABERT-2-117M" model from Huggingface

Hi,
As instructed, when I run:

import torch
from transformers import AutoModel
model = AutoModel.from_pretrained("zhihan1996/DNABERT-2-117M", trust_remote_code=True)

I got this error:

ValueError: The model class you are passing has a `config_class` attribute that is not consistent with the config class you passed (model has <class 'transformers.models.bert.configuration_bert.BertConfig'> and you passed <class 'transformers_modules.zhihan1996.DNABERT-2-117M.81ac6a98387cf94bc283553260f3fa6b88cef2fa.configuration_bert.BertConfig'>. Fix one of those so they match!

And, when I set trust_remote_code=False, I got these warnings:

Some weights of BertModel were not initialized from the model checkpoint at zhihan1996/DNABERT-2-117M and are newly initialized: ['bert.encoder.layer.9.attention.self.value.bias', 'bert.encoder.layer.9.attention.self.query.bias', 'bert.encoder.layer.1.attention.self.key.weight', 'bert.encoder.layer.3.intermediate.dense.weight', 'bert.encoder.layer.4.output.dense.bias', 'bert.encoder.layer.6.attention.self.value.bias', 'bert.encoder.layer.2.attention.self.value.bias', 'bert.encoder.layer.6.output.LayerNorm.weight', 'bert.encoder.layer.10.attention.self.value.weight', 'bert.encoder.layer.5.output.dense.weight', 'bert.encoder.layer.3.attention.self.key.weight', 'bert.encoder.layer.3.attention.self.query.weight', 'bert.encoder.layer.10.output.dense.weight', 'bert.encoder.layer.10.attention.self.value.bias', 'bert.embeddings.position_embeddings.weight', 'bert.encoder.layer.8.output.dense.bias', 'bert.encoder.layer.3.attention.self.key.bias', 'bert.encoder.layer.1.attention.self.query.weight', 'bert.encoder.layer.6.attention.self.query.weight', 'bert.encoder.layer.8.attention.self.value.bias', 'bert.encoder.layer.8.attention.self.query.bias', 'bert.encoder.layer.3.attention.self.query.bias', 'bert.encoder.layer.5.attention.self.query.weight', 'bert.encoder.layer.8.attention.self.query.weight', 'bert.encoder.layer.5.attention.self.key.weight', 'bert.encoder.layer.10.intermediate.dense.bias', 'bert.encoder.layer.5.output.dense.bias', 'bert.encoder.layer.10.output.LayerNorm.weight', 'bert.encoder.layer.7.output.dense.bias', 'bert.encoder.layer.0.attention.self.key.bias', 'bert.encoder.layer.1.attention.self.value.bias', 'bert.encoder.layer.9.output.LayerNorm.weight', 'bert.encoder.layer.4.intermediate.dense.bias', 'bert.encoder.layer.4.attention.self.query.weight', 'bert.encoder.layer.6.intermediate.dense.bias', 'bert.encoder.layer.8.attention.self.value.weight', 'bert.encoder.layer.1.attention.self.query.bias', 'bert.encoder.layer.10.output.dense.bias', 'bert.encoder.layer.6.output.LayerNorm.bias', 'bert.encoder.layer.11.output.LayerNorm.bias', 'bert.encoder.layer.5.intermediate.dense.bias', 'bert.pooler.dense.bias', 'bert.encoder.layer.7.attention.self.key.weight', 'bert.encoder.layer.4.attention.self.value.weight', 'bert.encoder.layer.5.output.LayerNorm.bias', 'bert.encoder.layer.4.output.LayerNorm.bias', 'bert.encoder.layer.8.attention.self.key.weight', 'bert.encoder.layer.9.attention.self.key.bias', 'bert.encoder.layer.4.output.LayerNorm.weight', 'bert.encoder.layer.0.intermediate.dense.weight', 'bert.encoder.layer.8.output.dense.weight', 'bert.encoder.layer.3.attention.self.value.bias', 'bert.encoder.layer.6.intermediate.dense.weight', 'bert.encoder.layer.2.intermediate.dense.weight', 'bert.encoder.layer.8.output.LayerNorm.weight', 'bert.encoder.layer.11.intermediate.dense.bias', 'bert.encoder.layer.3.output.dense.bias', 'bert.encoder.layer.0.attention.self.value.weight', 'bert.encoder.layer.1.intermediate.dense.weight', 'bert.encoder.layer.9.attention.self.key.weight', 'bert.encoder.layer.7.output.dense.weight', 'bert.encoder.layer.1.output.LayerNorm.bias', 'bert.encoder.layer.2.attention.self.key.weight', 'bert.encoder.layer.0.output.dense.weight', 'bert.encoder.layer.7.output.LayerNorm.weight', 'bert.encoder.layer.11.attention.self.value.bias', 'bert.encoder.layer.7.attention.self.query.weight', 'bert.encoder.layer.7.intermediate.dense.bias', 'bert.encoder.layer.11.output.dense.weight', 'bert.encoder.layer.4.attention.self.key.bias', 'bert.encoder.layer.6.output.dense.bias', 'bert.encoder.layer.7.intermediate.dense.weight', 'bert.encoder.layer.6.attention.self.key.weight', 'bert.encoder.layer.10.attention.self.query.weight', 'bert.encoder.layer.4.attention.self.key.weight', 'bert.encoder.layer.11.attention.self.key.weight', 'bert.encoder.layer.9.output.dense.bias', 'bert.encoder.layer.11.attention.self.key.bias', 'bert.encoder.layer.11.output.LayerNorm.weight', 'bert.encoder.layer.5.attention.self.value.weight', 'bert.encoder.layer.1.attention.self.key.bias', 'bert.encoder.layer.0.attention.self.key.weight', 'bert.encoder.layer.2.output.dense.weight', 'bert.encoder.layer.0.output.LayerNorm.weight', 'bert.encoder.layer.9.output.dense.weight', 'bert.encoder.layer.4.output.dense.weight', 'bert.encoder.layer.5.attention.self.query.bias', 'bert.encoder.layer.9.output.LayerNorm.bias', 'bert.encoder.layer.1.intermediate.dense.bias', 'bert.encoder.layer.6.attention.self.key.bias', 'bert.encoder.layer.0.attention.self.query.weight', 'bert.encoder.layer.11.intermediate.dense.weight', 'bert.encoder.layer.0.output.LayerNorm.bias', 'bert.encoder.layer.9.attention.self.value.weight', 'bert.encoder.layer.3.attention.self.value.weight', 'bert.encoder.layer.8.output.LayerNorm.bias', 'bert.encoder.layer.9.intermediate.dense.bias', 'bert.encoder.layer.2.attention.self.key.bias', 'bert.encoder.layer.2.attention.self.value.weight', 'bert.encoder.layer.2.attention.self.query.weight', 'bert.encoder.layer.3.output.LayerNorm.weight', 'bert.encoder.layer.5.attention.self.value.bias', 'bert.encoder.layer.0.output.dense.bias', 'bert.encoder.layer.10.attention.self.query.bias', 'bert.encoder.layer.1.output.dense.weight', 'bert.encoder.layer.7.attention.self.value.bias', 'bert.encoder.layer.2.attention.self.query.bias', 'bert.encoder.layer.0.attention.self.value.bias', 'bert.encoder.layer.10.output.LayerNorm.bias', 'bert.encoder.layer.7.output.LayerNorm.bias', 'bert.encoder.layer.9.intermediate.dense.weight', 'bert.encoder.layer.3.output.LayerNorm.bias', 'bert.encoder.layer.11.attention.self.query.bias', 'bert.encoder.layer.6.output.dense.weight', 'bert.encoder.layer.7.attention.self.query.bias', 'bert.encoder.layer.6.attention.self.value.weight', 'bert.encoder.layer.0.attention.self.query.bias', 'bert.encoder.layer.8.attention.self.key.bias', 'bert.encoder.layer.2.output.LayerNorm.bias', 'bert.encoder.layer.5.intermediate.dense.weight', 'bert.encoder.layer.2.intermediate.dense.bias', 'bert.encoder.layer.6.attention.self.query.bias', 'bert.encoder.layer.5.output.LayerNorm.weight', 'bert.encoder.layer.4.attention.self.value.bias', 'bert.encoder.layer.1.output.LayerNorm.weight', 'bert.encoder.layer.11.output.dense.bias', 'bert.encoder.layer.11.attention.self.value.weight', 'bert.encoder.layer.4.attention.self.query.bias', 'bert.encoder.layer.4.intermediate.dense.weight', 'bert.encoder.layer.10.attention.self.key.bias', 'bert.encoder.layer.7.attention.self.value.weight', 'bert.encoder.layer.10.attention.self.key.weight', 'bert.encoder.layer.1.output.dense.bias', 'bert.encoder.layer.8.intermediate.dense.bias', 'bert.encoder.layer.11.attention.self.query.weight', 'bert.encoder.layer.3.intermediate.dense.bias', 'bert.pooler.dense.weight', 'bert.encoder.layer.7.attention.self.key.bias', 'bert.encoder.layer.2.output.dense.bias', 'bert.encoder.layer.5.attention.self.key.bias', 'bert.encoder.layer.10.intermediate.dense.weight', 'bert.encoder.layer.8.intermediate.dense.weight', 'bert.encoder.layer.2.output.LayerNorm.weight', 'bert.encoder.layer.0.intermediate.dense.bias', 'bert.encoder.layer.3.output.dense.weight', 'bert.encoder.layer.9.attention.self.query.weight', 'bert.encoder.layer.1.attention.self.value.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

What should I do to have the pre-trained version of DNABERT-2-117M?
I want to fine-tune the model for another task.

flash_attn_triton.py

When calling :
hidden_states = model(inputs)[0] # [1, sequence_length, 768]

we receive traceback:

in <cell line: 1>:1 │
│ │
│ /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1501 in _call_impl │
│ │
│ 1498 │ │ if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks │
│ 1499 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hooks │
│ 1500 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │
│ ❱ 1501 │ │ │ return forward_call(*args, **kwargs) │
│ 1502 │ │ # Do not call functions when jit is used │
│ 1503 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1504 │ │ backward_pre_hooks = [] │
│ │
│ /root/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/5fd206e1a13cee3e │
│ f4a608677312175eb6f8143d/bert_layers.py:608 in forward │
│ │
│ 605 │ │ │ first_col_mask[:, 0] = True │
│ 606 │ │ │ subset_mask = masked_tokens_mask | first_col_mask │
│ 607 │ │ │
│ ❱ 608 │ │ encoder_outputs = self.encoder( │
│ 609 │ │ │ embedding_output, │
│ 610 │ │ │ attention_mask, │
│ 611 │ │ │ output_all_encoded_layers=output_all_encoded_layers, │
│ │
│ /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1501 in _call_impl │
│ │
│ 1498 │ │ if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks │
│ 1499 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hooks │
│ 1500 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │
│ ❱ 1501 │ │ │ return forward_call(*args, **kwargs) │
│ 1502 │ │ # Do not call functions when jit is used │
│ 1503 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1504 │ │ backward_pre_hooks = [] │
│ │
│ /root/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/5fd206e1a13cee3e │
│ f4a608677312175eb6f8143d/bert_layers.py:446 in forward │
│ │
│ 443 │ │ all_encoder_layers = [] │
│ 444 │ │ if subset_mask is None: │
│ 445 │ │ │ for layer_module in self.layer: │
│ ❱ 446 │ │ │ │ hidden_states = layer_module(hidden_states, │
│ 447 │ │ │ │ │ │ │ │ │ │ │ cu_seqlens, │
│ 448 │ │ │ │ │ │ │ │ │ │ │ seqlen, │
│ 449 │ │ │ │ │ │ │ │ │ │ │ None, │
│ │
│ /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1501 in _call_impl │
│ │
│ 1498 │ │ if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks │
│ 1499 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hooks │
│ 1500 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │
│ ❱ 1501 │ │ │ return forward_call(*args, **kwargs) │
│ 1502 │ │ # Do not call functions when jit is used │
│ 1503 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1504 │ │ backward_pre_hooks = [] │
│ │
│ /root/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/5fd206e1a13cee3e │
│ f4a608677312175eb6f8143d/bert_layers.py:327 in forward │
│ │
│ 324 │ │ │ attn_mask: None or (batch, max_seqlen_in_batch) │
│ 325 │ │ │ bias: None or (batch, heads, max_seqlen_in_batch, max_seqlen_in_batch) │
│ 326 │ │ """ │
│ ❱ 327 │ │ attention_output = self.attention(hidden_states, cu_seqlens, seqlen, │
│ 328 │ │ │ │ │ │ │ │ │ │ subset_idx, indices, attn_mask, bias) │
│ 329 │ │ layer_output = self.mlp(attention_output) │
│ 330 │ │ return layer_output │
│ │
│ /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1501 in _call_impl │
│ │
│ 1498 │ │ if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks │
│ 1499 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hooks │
│ 1500 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │
│ ❱ 1501 │ │ │ return forward_call(*args, **kwargs) │
│ 1502 │ │ # Do not call functions when jit is used │
│ 1503 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1504 │ │ backward_pre_hooks = [] │
│ │
│ /root/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/5fd206e1a13cee3e │
│ f4a608677312175eb6f8143d/bert_layers.py:240 in forward │
│ │
│ 237 │ │ │ attn_mask: None or (batch, max_seqlen_in_batch) │
│ 238 │ │ │ bias: None or (batch, heads, max_seqlen_in_batch, max_seqlen_in_batch) │
│ 239 │ │ """ │
│ ❱ 240 │ │ self_output = self.self(input_tensor, cu_seqlens, max_s, indices, │
│ 241 │ │ │ │ │ │ │ │ attn_mask, bias) │
│ 242 │ │ if subset_idx is not None: │
│ 243 │ │ │ return self.output(index_first_axis(self_output, subset_idx), │
│ │
│ /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1501 in _call_impl │
│ │
│ 1498 │ │ if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks │
│ 1499 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hooks │
│ 1500 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │
│ ❱ 1501 │ │ │ return forward_call(*args, **kwargs) │
│ 1502 │ │ # Do not call functions when jit is used │
│ 1503 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1504 │ │ backward_pre_hooks = [] │
│ │
│ /root/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/5fd206e1a13cee3e │
│ f4a608677312175eb6f8143d/bert_layers.py:181 in forward │
│ │
│ 178 │ │ │ │ qkv = qkv.to(torch.float16) │
│ 179 │ │ │ │ bias_dtype = bias.dtype │
│ 180 │ │ │ │ bias = bias.to(torch.float16) │
│ ❱ 181 │ │ │ │ attention = flash_attn_qkvpacked_func(qkv, bias) │
│ 182 │ │ │ │ attention = attention.to(orig_dtype) │
│ 183 │ │ │ │ bias = bias.to(bias_dtype) │
│ 184 │ │ │ else: │
│ │
│ /usr/local/lib/python3.10/dist-packages/torch/autograd/function.py:506 in apply │
│ │
│ 503 │ │ if not torch._C._are_functorch_transforms_active(): │
│ 504 │ │ │ # See NOTE: [functorch vjp and autograd interaction] │
│ 505 │ │ │ args = _functorch.utils.unwrap_dead_wrappers(args) │
│ ❱ 506 │ │ │ return super().apply(*args, **kwargs) # type: ignore[misc] │
│ 507 │ │ │
│ 508 │ │ if cls.setup_context == _SingleLevelFunction.setup_context: │
│ 509 │ │ │ raise RuntimeError( │
│ │
│ /root/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/5fd206e1a13cee3e │
│ f4a608677312175eb6f8143d/flash_attn_triton.py:1021 in forward │
│ │
│ 1018 │ │ # Make sure that the last dimension is contiguous │
│ 1019 │ │ if qkv.stride(-1) != 1: │
│ 1020 │ │ │ qkv = qkv.contiguous() │
│ ❱ 1021 │ │ o, lse, ctx.softmax_scale = _flash_attn_forward( │
│ 1022 │ │ │ qkv[:, :, 0], │
│ 1023 │ │ │ qkv[:, :, 1], │
│ 1024 │ │ │ qkv[:, :, 2], │
│ │
│ /root/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/5fd206e1a13cee3e │
│ f4a608677312175eb6f8143d/flash_attn_triton.py:781 in _flash_attn_forward │
│ │
│ 778 │ assert q.dtype == k.dtype == v.dtype, 'All tensors must have the same type' │
│ 779 │ assert q.dtype in [torch.float16, │
│ 780 │ │ │ │ │ torch.bfloat16], 'Only support fp16 and bf16' │
│ ❱ 781 │ assert q.is_cuda and k.is_cuda and v.is_cuda │
│ 782 │ softmax_scale = softmax_scale or 1.0 / math.sqrt(d) │
│ 783 │ │
│ 784 │ has_bias = bias is not None │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
AssertionError

Error about inconsistent config_class when loading model

Hi,

Thank you so much for updating DNABERT!!!

I'm running into the following error both on the command line and when I try to run the scripts/run_dnabert2.sh script. Do you have any insight into what's going wrong here?

Here's the command line version:

Python 3.8.17 (default, Jul  5 2023, 21:04:15) 
[GCC 11.2.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> from transformers import AutoTokenizer, AutoModel
>>> tokenizer = AutoTokenizer.from_pretrained("zhihan1996/DNABERT-2-117M", trust_remote_code=True)
>>> model = AutoModel.from_pretrained("zhihan1996/DNABERT-2-117M", trust_remote_code=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/nas/longleaf/home/mkratz/.conda/envs/dna/lib/python3.8/site-packages/transformers/models/auto/auto_factory.py", line 487, in from_pretrained
    cls.register(config.__class__, model_class, exist_ok=True)
  File "/nas/longleaf/home/mkratz/.conda/envs/dna/lib/python3.8/site-packages/transformers/models/auto/auto_factory.py", line 513, in register
    raise ValueError(
ValueError: The model class you are passing has a `config_class` attribute that is not consistent with the config class you passed (model has <class 'transformers.models.bert.configuration_bert.BertConfig'> and you passed <class 'transformers_modules.zhihan1996.DNABERT-2-117M.5fd206e1a13cee3ef4a608677312175eb6f8143d.configuration_bert.BertConfig'>. Fix one of those so they match!

Thanks!!!

Question about pre-trained models

Hello! Congratulations on your work. I was so happy to see this come out!

I have a question about running the GUE tests according to your instructions. In DNABERT1 you had to physically download the pre-trained model. It seems from looking at the code that now the models are hosted at huggingface and I do not need to download the pre-trained models to run these tests. Is that correct?

My goal is to do some further pre-training with bacterial datasets starting from a pre-trained model. To do this, would I load your pre-trained model as a checkpoint and then run a script similar to the pretraining script from DNABERT1?

Thanks for any help you can provide and again, great work!

LeAnn

AssertionError: assert q.is_cuda and k.is_cuda and v.is_cuda

I followed the content of the readme and the following error occurred

import torch
from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("zhihan1996/DNABERT-2-117M", trust_remote_code=True)
model = AutoModel.from_pretrained("zhihan1996/DNABERT-2-117M", trust_remote_code=True)

dna = "ACGTAGCATCGGATCTATCTATCGACACTTGGTTATCGATCTACGAGCATCTCGTTAGC"
inputs = tokenizer(dna, return_tensors = 'pt')["input_ids"]
hidden_states = model(inputs)[0] # [1, sequence_length, 768]

embedding with mean pooling

embedding_mean = torch.mean(hidden_states[0], dim=0)
print(embedding_mean.shape) # expect to be 768

embedding with max pooling

embedding_max = torch.max(hidden_states[0], dim=0)[0]
print(embedding_max.shape) # expect to be 768


/anaconda3/envs/dna/lib/python3.8/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
from .autonotebook import tqdm as notebook_tqdm
Some weights of the model checkpoint at zhihan1996/DNABERT-2-117M were not used when initializing BertModel: ['cls.predictions.transform.dense.bias', 'cls.predictions.decoder.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.decoder.weight']

  • This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
  • This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
    Some weights of BertModel were not initialized from the model checkpoint at zhihan1996/DNABERT-2-117M and are newly initialized: ['bert.pooler.dense.weight', 'bert.pooler.dense.bias']
    You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

AssertionError Traceback (most recent call last)
Cell In[1], line 9
7 dna = "ACGTAGCATCGGATCTATCTATCGACACTTGGTTATCGATCTACGAGCATCTCGTTAGC"
8 inputs = tokenizer(dna, return_tensors = 'pt')["input_ids"]
----> 9 hidden_states = model(inputs)[0] # [1, sequence_length, 768]
11 # embedding with mean pooling
12 embedding_mean = torch.mean(hidden_states[0], dim=0)

File ~/anaconda3/envs/dna/lib/python3.8/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
1496 # If we don't have any hooks, we want to skip the rest of the logic in
1497 # this function, and just call forward.
1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = [], []

File ~/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/5fd206e1a13cee3ef4a608677312175eb6f8143d/bert_layers.py:608, in BertModel.forward(self, input_ids, token_type_ids, attention_mask, position_ids, output_all_encoded_layers, masked_tokens_mask, **kwargs)
605 first_col_mask[:, 0] = True
606 subset_mask = masked_tokens_mask | first_col_mask
--> 608 encoder_outputs = self.encoder(
609 embedding_output,
610 attention_mask,
611 output_all_encoded_layers=output_all_encoded_layers,
612 subset_mask=subset_mask)
614 if masked_tokens_mask is None:
615 sequence_output = encoder_outputs[-1]

File ~/anaconda3/envs/dna/lib/python3.8/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
1496 # If we don't have any hooks, we want to skip the rest of the logic in
1497 # this function, and just call forward.
1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = [], []

File ~/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/5fd206e1a13cee3ef4a608677312175eb6f8143d/bert_layers.py:446, in BertEncoder.forward(self, hidden_states, attention_mask, output_all_encoded_layers, subset_mask)
444 if subset_mask is None:
445 for layer_module in self.layer:
--> 446 hidden_states = layer_module(hidden_states,
447 cu_seqlens,
448 seqlen,
449 None,
450 indices,
451 attn_mask=attention_mask,
452 bias=alibi_attn_mask)
453 if output_all_encoded_layers:
454 all_encoder_layers.append(hidden_states)

File ~/anaconda3/envs/dna/lib/python3.8/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
1496 # If we don't have any hooks, we want to skip the rest of the logic in
1497 # this function, and just call forward.
1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = [], []

File ~/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/5fd206e1a13cee3ef4a608677312175eb6f8143d/bert_layers.py:327, in BertLayer.forward(self, hidden_states, cu_seqlens, seqlen, subset_idx, indices, attn_mask, bias)
305 def forward(
306 self,
307 hidden_states: torch.Tensor,
(...)
313 bias: Optional[torch.Tensor] = None,
314 ) -> torch.Tensor:
315 """Forward pass for a BERT layer, including both attention and MLP.
316
317 Args:
(...)
325 bias: None or (batch, heads, max_seqlen_in_batch, max_seqlen_in_batch)
326 """
--> 327 attention_output = self.attention(hidden_states, cu_seqlens, seqlen,
328 subset_idx, indices, attn_mask, bias)
329 layer_output = self.mlp(attention_output)
330 return layer_output

File ~/anaconda3/envs/dna/lib/python3.8/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
1496 # If we don't have any hooks, we want to skip the rest of the logic in
1497 # this function, and just call forward.
1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = [], []

File ~/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/5fd206e1a13cee3ef4a608677312175eb6f8143d/bert_layers.py:240, in BertUnpadAttention.forward(self, input_tensor, cu_seqlens, max_s, subset_idx, indices, attn_mask, bias)
218 def forward(
219 self,
220 input_tensor: torch.Tensor,
(...)
226 bias: Optional[torch.Tensor] = None,
227 ) -> torch.Tensor:
228 """Forward pass for scaled self-attention without padding.
229
230 Arguments:
(...)
238 bias: None or (batch, heads, max_seqlen_in_batch, max_seqlen_in_batch)
239 """
--> 240 self_output = self.self(input_tensor, cu_seqlens, max_s, indices,
241 attn_mask, bias)
242 if subset_idx is not None:
243 return self.output(index_first_axis(self_output, subset_idx),
244 index_first_axis(input_tensor, subset_idx))

File ~/anaconda3/envs/dna/lib/python3.8/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
1496 # If we don't have any hooks, we want to skip the rest of the logic in
1497 # this function, and just call forward.
1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = [], []

File ~/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/5fd206e1a13cee3ef4a608677312175eb6f8143d/bert_layers.py:181, in BertUnpadSelfAttention.forward(self, hidden_states, cu_seqlens, max_seqlen_in_batch, indices, attn_mask, bias)
179 bias_dtype = bias.dtype
180 bias = bias.to(torch.float16)
--> 181 attention = flash_attn_qkvpacked_func(qkv, bias)
182 attention = attention.to(orig_dtype)
183 bias = bias.to(bias_dtype)

File ~/anaconda3/envs/dna/lib/python3.8/site-packages/torch/autograd/function.py:506, in Function.apply(cls, *args, **kwargs)
503 if not torch._C._are_functorch_transforms_active():
504 # See NOTE: [functorch vjp and autograd interaction]
505 args = _functorch.utils.unwrap_dead_wrappers(args)
--> 506 return super().apply(*args, **kwargs) # type: ignore[misc]
508 if cls.setup_context == _SingleLevelFunction.setup_context:
509 raise RuntimeError(
510 'In order to use an autograd.Function with functorch transforms '
511 '(vmap, grad, jvp, jacrev, ...), it must override the setup_context '
512 'staticmethod. For more details, please see '
513 'https://pytorch.org/docs/master/notes/extending.func.html')

File ~/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/5fd206e1a13cee3ef4a608677312175eb6f8143d/flash_attn_triton.py:1021, in _FlashAttnQKVPackedFunc.forward(ctx, qkv, bias, causal, softmax_scale)
1019 if qkv.stride(-1) != 1:
1020 qkv = qkv.contiguous()
-> 1021 o, lse, ctx.softmax_scale = _flash_attn_forward(
1022 qkv[:, :, 0],
1023 qkv[:, :, 1],
1024 qkv[:, :, 2],
1025 bias=bias,
1026 causal=causal,
1027 softmax_scale=softmax_scale)
1028 ctx.save_for_backward(qkv, o, lse, bias)
1029 ctx.causal = causal

File ~/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/5fd206e1a13cee3ef4a608677312175eb6f8143d/flash_attn_triton.py:781, in _flash_attn_forward(q, k, v, bias, causal, softmax_scale)
778 assert q.dtype == k.dtype == v.dtype, 'All tensors must have the same type'
779 assert q.dtype in [torch.float16,
780 torch.bfloat16], 'Only support fp16 and bf16'
--> 781 assert q.is_cuda and k.is_cuda and v.is_cuda
782 softmax_scale = softmax_scale or 1.0 / math.sqrt(d)
784 has_bias = bias is not None

AssertionError:

config_class error

model = AutoModel.from_pretrained("zhihan1996/DNABERT-2-117M", trust_remote_code=True) triggers an error:

The model class you are passing has a config_class attribute that is not consistent with the config class you passed (model has <class 'transformers.models.bert.configuration_bert.BertConfig'> and you passed <class 'transformers_modules.zhihan1996.DNABERT-2-117M.81ac6a98387cf94bc283553260f3fa6b88cef2fa.configuration_bert.BertConfig'>. Fix one of those so they match!

Any idea how to by pass this?

Another question would be after obtain the token embeddings, any way to convert it back to embeddings for each nucleotide? Thanks!

Update requirements with version numbers

Hi,

I just wanted to note that installing the requirements.txt in a clean environment leads to versions which are not compatible, leading to errors with triton. Installing "some" of the versions listed in this discussion helped for me.

Thanks again for your work!

RuntimeError: Triton Error [CUDA]: invalid argument by run_dnabert2.sh

What is the problem and the solution??

The provided data_path is /home/shiro/DNABERT_2/finetune
2023-08-31 17:57:18.856636: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib:/usr/local/cuda/lib64:
2023-08-31 17:57:18.856685: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
WARNING:root:Perform single sequence classification...
WARNING:root:Perform single sequence classification...
WARNING:root:Perform single sequence classification...
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Some weights of the model checkpoint at zhihan1996/DNABERT-2-117M were not used when initializing BertForSequenceClassification: ['cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.decoder.bias']

  • This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
  • This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
    Some weights of BertForSequenceClassification were not initialized from the model checkpoint at zhihan1996/DNABERT-2-117M and are newly initialized: ['classifier.weight', 'bert.pooler.dense.weight', 'bert.pooler.dense.bias', 'classifier.bias']
    You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
    Using cuda_amp half precision backend
    ***** Running training *****
    Num examples = 36,496
    Num Epochs = 5
    Instantaneous batch size per device = 8
    Total train batch size (w. parallel, distributed & accumulation) = 32
    Gradient Accumulation steps = 4
    Total optimization steps = 5,700
    Number of trainable parameters = 117,070,851
    0%| | 0/5700 [00:00<?, ?it/s]Traceback (most recent call last):
    File "", line 21, in _bwd_kernel
    KeyError: ('2-.-0-.-0-1e8410f206c822547fb50e2ea86e45a6-2b0c5161c53c71b37ae20a9996ee4bb8-c1f92808b4e4644c1732e8338187ac87-42648570729a4835b21c1c18cebedbfe-12f7ac1ca211e037f62a7c0c323d9990-5c5e32ff210f3b7f56c98ca29917c25e-06f0df2d61979d629033f4a22eff5198-0dd03b0bd512a184b3512b278d9dfa59-d35ab04ae841e2714a253c523530b071', (torch.float16, torch.float16, torch.float16, torch.float32, torch.float16, torch.float32, torch.float16, torch.float16, torch.float32, torch.float32, 'fp32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32'), ('matrix', False, 64, False, False, False, True, 128, 128), (True, True, True, True, True, True, True, True, True, True, (False,), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (False, False), (True, False), (True, False), (True, False), (True, False), (False, False), (False, False)))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "train_modified.py", line 332, in
train()
File "train_modified.py", line 314, in train
trainer.train()
File "/home/shiro/miniconda3/envs/dnabert2/lib/python3.8/site-packages/transformers/trainer.py", line 1664, in train
return inner_training_loop(
File "/home/shiro/miniconda3/envs/dnabert2/lib/python3.8/site-packages/transformers/trainer.py", line 1940, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/home/shiro/miniconda3/envs/dnabert2/lib/python3.8/site-packages/transformers/trainer.py", line 2745, in training_step
self.scaler.scale(loss).backward()
File "/home/shiro/miniconda3/envs/dnabert2/lib/python3.8/site-packages/torch/_tensor.py", line 487, in backward
torch.autograd.backward(
File "/home/shiro/miniconda3/envs/dnabert2/lib/python3.8/site-packages/torch/autograd/init.py", line 197, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
File "/home/shiro/miniconda3/envs/dnabert2/lib/python3.8/site-packages/torch/autograd/function.py", line 267, in apply
return user_fn(self, *args)
File "/home/shiro/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/81ac6a98387cf94bc283553260f3fa6b88cef2fa/flash_attn_triton.py", line 1041, in backward
_flash_attn_backward(do,
File "/home/shiro/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/81ac6a98387cf94bc283553260f3fa6b88cef2fa/flash_attn_triton.py", line 949, in _flash_attn_backward
_bwd_kernel[grid]( # type: ignore
File "/home/shiro/miniconda3/envs/dnabert2/lib/python3.8/site-packages/triton/runtime/jit.py", line 106, in launcher
return self.run(*args, grid=grid, **kwargs)
File "/home/shiro/miniconda3/envs/dnabert2/lib/python3.8/site-packages/triton/runtime/autotuner.py", line 73, in run
timings = {config: self._bench(*args, config=config, **kwargs)
File "/home/shiro/miniconda3/envs/dnabert2/lib/python3.8/site-packages/triton/runtime/autotuner.py", line 73, in
timings = {config: self._bench(*args, config=config, **kwargs)
File "/home/shiro/miniconda3/envs/dnabert2/lib/python3.8/site-packages/triton/runtime/autotuner.py", line 63, in _bench
return do_bench(kernel_call)
File "/home/shiro/miniconda3/envs/dnabert2/lib/python3.8/site-packages/triton/testing.py", line 140, in do_bench
fn()
File "/home/shiro/miniconda3/envs/dnabert2/lib/python3.8/site-packages/triton/runtime/autotuner.py", line 62, in kernel_call
self.fn.run(*args, num_warps=config.num_warps, num_stages=config.num_stages, **current)
File "/home/shiro/miniconda3/envs/dnabert2/lib/python3.8/site-packages/triton/runtime/autotuner.py", line 200, in run
return self.fn.run(*args, **kwargs)
File "", line 43, in _bwd_kernel
RuntimeError: Triton Error [CUDA]: invalid argument
0%| | 0/5700 [00:00<?, ?it/s

benchmark performence

The perfermance in Table4 is F1 score ? I always got higher score when run run_dnabert2.sh...

Redundancy in the code and has issues

Hey,

Congrats on your contribution but the code isn't properly written. It might be that at the time of training for you, it must've worked but now, there are some severe issues. Please don't get me wrong, I hope you fix them or give me an opportunity to contribute to fix it.

The key issues with DNABERT-2

  1. ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 2 dimensions. The detected shape was (2, 3168) + inhomogeneous part.
  2. The .sh keeps going even after the error prompt (bad designing of the code).
  3. The training somehow stops in between and the code jumps to evaluation where the above shown ValueError occurs.
  4. There is no epoch tracking. This makes it extremely un-informative of the progress.

Are you planning to fix these issues? Thanks for the huggingface model btw. That seems to be working a ok!

How to get mean embeddings from model

Hi, do you have any examples of how to extract sequence embeddings using this model?

I tried the following code but get an error:

import torch
from transformers import AutoTokenizer, AutoModelForMaskedLM

tokenizer = AutoTokenizer.from_pretrained("zhihan1996/DNABERT-2-117M")
model = AutoModelForMaskedLM.from_pretrained("zhihan1996/DNABERT-2-117M")

tok = tokenizer("ACGTAGCATCGGATCTATCTATCGACACTTGGTTATCGATCTACGAGCATCTCGTTAGC", return_tensors = 'pt')
outs = model(tok)

Gives error below:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
File ~/mambaforge/envs/predictor/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:254, in BatchEncoding.__getattr__(self, item)
    253 try:
--> 254     return self.data[item]
    255 except KeyError:

KeyError: 'size'

During handling of the above exception, another exception occurred:

AttributeError                            Traceback (most recent call last)
Cell In[67], line 1
----> 1 outs = model(tok)

File ~/mambaforge/envs/predictor/lib/python3.10/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
   1496 # If we don't have any hooks, we want to skip the rest of the logic in
   1497 # this function, and just call forward.
   1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1499         or _global_backward_pre_hooks or _global_backward_hooks
   1500         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501     return forward_call(*args, **kwargs)
   1502 # Do not call functions when jit is used
   1503 full_backward_hooks, non_full_backward_hooks = [], []

File ~/mambaforge/envs/predictor/lib/python3.10/site-packages/transformers/models/bert/modeling_bert.py:1358, in BertForMaskedLM.forward(self, input_ids, attention_mask, token_type_ids, position_ids, head_mask, inputs_embeds, encoder_hidden_states, encoder_attention_mask, labels, output_attentions, output_hidden_states, return_dict)
   1349 r"""
   1350 labels (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*):
   1351     Labels for computing the masked language modeling loss. Indices should be in `[-100, 0, ...,
   1352     config.vocab_size]` (see `input_ids` docstring) Tokens with indices set to `-100` are ignored (masked), the
   1353     loss is only computed for the tokens with labels in `[0, ..., config.vocab_size]`
   1354 """
   1356 return_dict = return_dict if return_dict is not None else self.config.use_return_dict
-> 1358 outputs = self.bert(
   1359     input_ids,
   1360     attention_mask=attention_mask,
   1361     token_type_ids=token_type_ids,
   1362     position_ids=position_ids,
   1363     head_mask=head_mask,
   1364     inputs_embeds=inputs_embeds,
   1365     encoder_hidden_states=encoder_hidden_states,
   1366     encoder_attention_mask=encoder_attention_mask,
   1367     output_attentions=output_attentions,
   1368     output_hidden_states=output_hidden_states,
   1369     return_dict=return_dict,
   1370 )
   1372 sequence_output = outputs[0]
   1373 prediction_scores = self.cls(sequence_output)

File ~/mambaforge/envs/predictor/lib/python3.10/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
   1496 # If we don't have any hooks, we want to skip the rest of the logic in
   1497 # this function, and just call forward.
   1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1499         or _global_backward_pre_hooks or _global_backward_hooks
   1500         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501     return forward_call(*args, **kwargs)
   1502 # Do not call functions when jit is used
   1503 full_backward_hooks, non_full_backward_hooks = [], []

File ~/mambaforge/envs/predictor/lib/python3.10/site-packages/transformers/models/bert/modeling_bert.py:968, in BertModel.forward(self, input_ids, attention_mask, token_type_ids, position_ids, head_mask, inputs_embeds, encoder_hidden_states, encoder_attention_mask, past_key_values, use_cache, output_attentions, output_hidden_states, return_dict)
    966     raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time")
    967 elif input_ids is not None:
--> 968     input_shape = input_ids.size()
    969 elif inputs_embeds is not None:
    970     input_shape = inputs_embeds.size()[:-1]

File ~/mambaforge/envs/predictor/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:256, in BatchEncoding.__getattr__(self, item)
    254     return self.data[item]
    255 except KeyError:
--> 256     raise AttributeError

AttributeError: 

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.