Question about pre-trained models

Hello! Congratulations on your work. I was so happy to see this come out!

I have a question about running the GUE tests according to your instructions. In DNABERT1 you had to physically download the pre-trained model. It seems from looking at the code that now the models are hosted at huggingface and I do not need to download the pre-trained models to run these tests. Is that correct?

My goal is to do some further pre-training with bacterial datasets starting from a pre-trained model. To do this, would I load your pre-trained model as a checkpoint and then run a script similar to the pretraining script from DNABERT1?

Thanks for any help you can provide and again, great work!

LeAnn

Downstream Association Study

Hi Zhihan,

I am wondering what do you think about the downstream association study to identify phenotype associated variants/regions in Genome Foundation Model. Do you think current model is suitable for such task or not? if not, is there any comments on how to develop a good model for such task.

Thanks

Shicheng

Functional variants identification with DNABERT-2

Dear Zhihan,

Thank you for the great contribution to the Genome Foundation Model filed. I am quite interested in the Functional variants identification with DNABERT-2 section. However, I am not so sure about the "predicted high-attention regions" is from pre-training model or fine-tuning model. if it is fine-tuning model, which sub-task it is from?

We applied DNABERT to identify functional variants using around 700 million short variants in dbSNP([Sherry, 2001]. Specifically, we selected only those variants that are located inside DNABERT predicted high-attention regions and repeated the predictions, using sequences with altered alleles.

Thanks.

Shicheng

I think I get the point now. It is from fine-tuning stage.

Perplexity value after pre-training?

This was not reported in the paper but I was interested in what the perplexity value was after pre-training. Thank you,
LeAnn

GUE: Splice site recognition in regards to sequence labeling/NER vs classification

Hi Zhihan,

Hope you're doing well. I had a wonderful time reading your paper in regards to the improvements made to DNABERT and trying to set up a language model evaluation framework.

However, I question whether the labels for the splice site recognition benchmark truly represents how a user may approach this task. When going through the train/dev/test set, I've noticed that your labels only provide an annotation on whether the entire sequence is {no splice site, donor, acceptor}. Current state of the art tools, such as SpliceAI, provide even further information similar to an NER task by labeling every nucleotide within a given sequence. Therefore, has DNABERT2 been previously evaluated in evaluating Splice sites as an NER task?

RuntimeError: The expanded size of the tensor (673) must match the existing size (512) at non-singleton dimension 1. Target sizes: [8, 673]. Tensor sizes: [1, 512]

I tried to run the finetune, I put the sequences and labels to the csv files, just followed the format in the sample data. But when I ran this, error happened, I do not know why. Could anyone help me, thank you?
`
trainer.train()

File "/home/u7343217/.conda/envs/canberra_city/lib/python3.10/site-packages/transformers/trainer.py", line 1553, in train return inner_training_loop(

File "/home/u7343217/.conda/envs/canberra_city/lib/python3.10/site-packages/transformers/trainer.py", line 1835, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/home/u7343217/.conda/envs/canberra_city/lib/python3.10/site-packages/transformers/trainer.py", line 2679, in training_step
loss = self.compute_loss(model, inputs)
File "/home/u7343217/.conda/envs/canberra_city/lib/python3.10/site-packages/transformers/trainer.py", line 2704, in compute_loss
outputs = model(**inputs)
File "/home/u7343217/.conda/envs/canberra_city/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/u7343217/.conda/envs/canberra_city/lib/python3.10/site-packages/torch/nn/parallel/data_parallel.py", line 171, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/home/u7343217/.conda/envs/canberra_city/lib/python3.10/site-packages/torch/nn/parallel/data_parallel.py", line 181, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/home/u7343217/.conda/envs/canberra_city/lib/python3.10/site-packages/torch/nn/parallel/parallel_apply.py", line 89, in parallel_apply
output.reraise()
File "/home/u7343217/.conda/envs/canberra_city/lib/python3.10/site-packages/torch/_utils.py", line 644, in reraise
raise exception
RuntimeError: Caught RuntimeError in replica 0 on device 0.
Original Traceback (most recent call last):
File "/home/u7343217/.conda/envs/canberra_city/lib/python3.10/site-packages/torch/nn/parallel/parallel_apply.py", line 64, in _worker
output = module(*input, **kwargs)
File "/home/u7343217/.conda/envs/canberra_city/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/u7343217/.conda/envs/canberra_city/lib/python3.10/site-packages/transformers/models/bert/modeling_bert.py", line 1562, in forward
outputs = self.bert(
File "/home/u7343217/.conda/envs/canberra_city/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/u7343217/.conda/envs/canberra_city/lib/python3.10/site-packages/transformers/models/bert/modeling_bert.py", line 988, in forward
buffered_token_type_ids_expanded = buffered_token_type_ids.expand(batch_size, seq_length)
RuntimeError: The expanded size of the tensor (673) must match the existing size (512) at non-singleton dimension 1. Target sizes: [8, 673]. Tensor sizes: [1, 512] `

It seems that the dimension of the tensor do not match, but I did not change any code, The command I ran was, python train.py --model_name_or_path zhihan1996/DNABERT-2-117M --data_path splited/csv_files --kmer -1 --run_name DNABERT2_test1 --model_max_length 700 --per_device_train_batch_size 8 --per_device_eval_batch_size 16 --gradient_accumulation_steps 1 --learning_rate 3e-5 --num_train_epochs 3 --fp16 --save_steps 200 --output_dir output/dnabert2 --evaluation_strategy steps --eval_steps 200 --warmup_steps 50 --logging_steps 100000 --overwrite_output_dir True --log_level info --find_unused_parameters False. I set the max length hyperparameter as 700 because my sequences are long.

AssertionError: assert q.is_cuda and k.is_cuda and v.is_cuda

I followed the content of the readme and the following error occurred

import torch
from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("zhihan1996/DNABERT-2-117M", trust_remote_code=True)
model = AutoModel.from_pretrained("zhihan1996/DNABERT-2-117M", trust_remote_code=True)

dna = "ACGTAGCATCGGATCTATCTATCGACACTTGGTTATCGATCTACGAGCATCTCGTTAGC"
inputs = tokenizer(dna, return_tensors = 'pt')["input_ids"]
hidden_states = model(inputs)[0] # [1, sequence_length, 768]

embedding with mean pooling

embedding_mean = torch.mean(hidden_states[0], dim=0)
print(embedding_mean.shape) # expect to be 768

embedding with max pooling

embedding_max = torch.max(hidden_states[0], dim=0)[0]
print(embedding_max.shape) # expect to be 768

/anaconda3/envs/dna/lib/python3.8/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
from .autonotebook import tqdm as notebook_tqdm
Some weights of the model checkpoint at zhihan1996/DNABERT-2-117M were not used when initializing BertModel: ['cls.predictions.transform.dense.bias', 'cls.predictions.decoder.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.decoder.weight']

This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertModel were not initialized from the model checkpoint at zhihan1996/DNABERT-2-117M and are newly initialized: ['bert.pooler.dense.weight', 'bert.pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

AssertionError Traceback (most recent call last)
Cell In[1], line 9
7 dna = "ACGTAGCATCGGATCTATCTATCGACACTTGGTTATCGATCTACGAGCATCTCGTTAGC"
8 inputs = tokenizer(dna, return_tensors = 'pt')["input_ids"]
----> 9 hidden_states = model(inputs)[0] # [1, sequence_length, 768]
11 # embedding with mean pooling
12 embedding_mean = torch.mean(hidden_states[0], dim=0)

File ~/anaconda3/envs/dna/lib/python3.8/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
1496 # If we don't have any hooks, we want to skip the rest of the logic in
1497 # this function, and just call forward.
1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = [], []

File ~/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/5fd206e1a13cee3ef4a608677312175eb6f8143d/bert_layers.py:608, in BertModel.forward(self, input_ids, token_type_ids, attention_mask, position_ids, output_all_encoded_layers, masked_tokens_mask, **kwargs)
605 first_col_mask[:, 0] = True
606 subset_mask = masked_tokens_mask | first_col_mask
--> 608 encoder_outputs = self.encoder(
609 embedding_output,
610 attention_mask,
611 output_all_encoded_layers=output_all_encoded_layers,
612 subset_mask=subset_mask)
614 if masked_tokens_mask is None:
615 sequence_output = encoder_outputs[-1]

File ~/anaconda3/envs/dna/lib/python3.8/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
1496 # If we don't have any hooks, we want to skip the rest of the logic in
1497 # this function, and just call forward.
1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = [], []

File ~/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/5fd206e1a13cee3ef4a608677312175eb6f8143d/bert_layers.py:446, in BertEncoder.forward(self, hidden_states, attention_mask, output_all_encoded_layers, subset_mask)
444 if subset_mask is None:
445 for layer_module in self.layer:
--> 446 hidden_states = layer_module(hidden_states,
447 cu_seqlens,
448 seqlen,
449 None,
450 indices,
451 attn_mask=attention_mask,
452 bias=alibi_attn_mask)
453 if output_all_encoded_layers:
454 all_encoder_layers.append(hidden_states)

File ~/anaconda3/envs/dna/lib/python3.8/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
1496 # If we don't have any hooks, we want to skip the rest of the logic in
1497 # this function, and just call forward.
1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = [], []

File ~/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/5fd206e1a13cee3ef4a608677312175eb6f8143d/bert_layers.py:327, in BertLayer.forward(self, hidden_states, cu_seqlens, seqlen, subset_idx, indices, attn_mask, bias)
305 def forward(
306 self,
307 hidden_states: torch.Tensor,
(...)
313 bias: Optional[torch.Tensor] = None,
314 ) -> torch.Tensor:
315 """Forward pass for a BERT layer, including both attention and MLP.
316
317 Args:
(...)
325 bias: None or (batch, heads, max_seqlen_in_batch, max_seqlen_in_batch)
326 """
--> 327 attention_output = self.attention(hidden_states, cu_seqlens, seqlen,
328 subset_idx, indices, attn_mask, bias)
329 layer_output = self.mlp(attention_output)
330 return layer_output

File ~/anaconda3/envs/dna/lib/python3.8/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
1496 # If we don't have any hooks, we want to skip the rest of the logic in
1497 # this function, and just call forward.
1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = [], []

File ~/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/5fd206e1a13cee3ef4a608677312175eb6f8143d/bert_layers.py:240, in BertUnpadAttention.forward(self, input_tensor, cu_seqlens, max_s, subset_idx, indices, attn_mask, bias)
218 def forward(
219 self,
220 input_tensor: torch.Tensor,
(...)
226 bias: Optional[torch.Tensor] = None,
227 ) -> torch.Tensor:
228 """Forward pass for scaled self-attention without padding.
229
230 Arguments:
(...)
238 bias: None or (batch, heads, max_seqlen_in_batch, max_seqlen_in_batch)
239 """
--> 240 self_output = self.self(input_tensor, cu_seqlens, max_s, indices,
241 attn_mask, bias)
242 if subset_idx is not None:
243 return self.output(index_first_axis(self_output, subset_idx),
244 index_first_axis(input_tensor, subset_idx))

File ~/anaconda3/envs/dna/lib/python3.8/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
1496 # If we don't have any hooks, we want to skip the rest of the logic in
1497 # this function, and just call forward.
1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = [], []

File ~/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/5fd206e1a13cee3ef4a608677312175eb6f8143d/bert_layers.py:181, in BertUnpadSelfAttention.forward(self, hidden_states, cu_seqlens, max_seqlen_in_batch, indices, attn_mask, bias)
179 bias_dtype = bias.dtype
180 bias = bias.to(torch.float16)
--> 181 attention = flash_attn_qkvpacked_func(qkv, bias)
182 attention = attention.to(orig_dtype)
183 bias = bias.to(bias_dtype)

File ~/anaconda3/envs/dna/lib/python3.8/site-packages/torch/autograd/function.py:506, in Function.apply(cls, *args, **kwargs)
503 if not torch._C._are_functorch_transforms_active():
504 # See NOTE: [functorch vjp and autograd interaction]
505 args = _functorch.utils.unwrap_dead_wrappers(args)
--> 506 return super().apply(*args, **kwargs) # type: ignore[misc]
508 if cls.setup_context == _SingleLevelFunction.setup_context:
509 raise RuntimeError(
510 'In order to use an autograd.Function with functorch transforms '
511 '(vmap, grad, jvp, jacrev, ...), it must override the setup_context '
512 'staticmethod. For more details, please see '
513 'https://pytorch.org/docs/master/notes/extending.func.html')

File ~/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/5fd206e1a13cee3ef4a608677312175eb6f8143d/flash_attn_triton.py:1021, in _FlashAttnQKVPackedFunc.forward(ctx, qkv, bias, causal, softmax_scale)
1019 if qkv.stride(-1) != 1:
1020 qkv = qkv.contiguous()
-> 1021 o, lse, ctx.softmax_scale = _flash_attn_forward(
1022 qkv[:, :, 0],
1023 qkv[:, :, 1],
1024 qkv[:, :, 2],
1025 bias=bias,
1026 causal=causal,
1027 softmax_scale=softmax_scale)
1028 ctx.save_for_backward(qkv, o, lse, bias)
1029 ctx.causal = causal

File ~/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/5fd206e1a13cee3ef4a608677312175eb6f8143d/flash_attn_triton.py:781, in _flash_attn_forward(q, k, v, bias, causal, softmax_scale)
778 assert q.dtype == k.dtype == v.dtype, 'All tensors must have the same type'
779 assert q.dtype in [torch.float16,
780 torch.bfloat16], 'Only support fp16 and bf16'
--> 781 assert q.is_cuda and k.is_cuda and v.is_cuda
782 softmax_scale = softmax_scale or 1.0 / math.sqrt(d)
784 has_bias = bias is not None

AssertionError:

Pre-training code?

Hello, I was just wondering if you know when you will make the pre-training code available on your github? and, is it very different or similar to the pretraining code you have provided for DNABERT2?

Thank you for any assistance.
LeAnn

Despite multiple trials and examining the model configuration, it seems that the model hosted on Hugging Face (`huggingface.co`) cannot handle sequences that exceed a length of 512 tokens. I've provided the relevant code below for clarity

Part 1: Tokenization and Dataset Preparation

from transformers import AutoTokenizer, BertForSequenceClassification
from torch.utils.data import Dataset

tokenizer = AutoTokenizer.from_pretrained("zhihan1996/DNABERT-2-117M", trust_remote_code=True)
model = BertForSequenceClassification.from_pretrained("zhihan1996/DNABERT-2-117M", num_labels=8)

class DNADataset(Dataset):
    def __init__(self, data, tokenizer):
        self.data = data
        self.tokenizer = tokenizer

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        seq, label = self.data[idx]
        inputs = self.tokenizer(seq, return_tensors='pt', padding='max_length', max_length=600, truncation=True)
        return {
            'input_ids': inputs["input_ids"].squeeze(),
            'label': label
        }

Part 2: Retrieving Model Configuration

from transformers import AutoConfig

config = AutoConfig.from_pretrained("zhihan1996/DNABERT-2-117M", trust_remote_code=True)
print(config.max_position_embeddings)

lora_target_modules default wrong

Hi,

Thanks a lot for uploading DNABERT2.
I have noticed a small mistake when using lora, the default lora_target_modules ["query", "value"] do not seem to be correct. Through trial and error ["q", "v"] worked for me.

Error about inconsistent config_class when loading model

Hi,

Thank you so much for updating DNABERT!!!

I'm running into the following error both on the command line and when I try to run the scripts/run_dnabert2.sh script. Do you have any insight into what's going wrong here?

Here's the command line version:

Python 3.8.17 (default, Jul  5 2023, 21:04:15) 
[GCC 11.2.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> from transformers import AutoTokenizer, AutoModel
>>> tokenizer = AutoTokenizer.from_pretrained("zhihan1996/DNABERT-2-117M", trust_remote_code=True)
>>> model = AutoModel.from_pretrained("zhihan1996/DNABERT-2-117M", trust_remote_code=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/nas/longleaf/home/mkratz/.conda/envs/dna/lib/python3.8/site-packages/transformers/models/auto/auto_factory.py", line 487, in from_pretrained
    cls.register(config.__class__, model_class, exist_ok=True)
  File "/nas/longleaf/home/mkratz/.conda/envs/dna/lib/python3.8/site-packages/transformers/models/auto/auto_factory.py", line 513, in register
    raise ValueError(
ValueError: The model class you are passing has a `config_class` attribute that is not consistent with the config class you passed (model has <class 'transformers.models.bert.configuration_bert.BertConfig'> and you passed <class 'transformers_modules.zhihan1996.DNABERT-2-117M.5fd206e1a13cee3ef4a608677312175eb6f8143d.configuration_bert.BertConfig'>. Fix one of those so they match!

Thanks!!!

ALiBi not working

Hi, I tried following the README with

from transformers import AutoModel, AutoTokenizer

# Load the model and tokenizer
model = AutoModel.from_pretrained("zhihan1996/DNABERT-2-117M")
tokenizer = AutoTokenizer.from_pretrained("zhihan1996/DNABERT-2-117M")

This does not work correctly with ALiBi, as it doesn't use the model class defined at https://huggingface.co/zhihan1996/DNABERT-2-117M/blob/main/bert_layers.py
Rather, it uses the default Huggingface BERT class, which, if I understand it correctly, does not have the ALiBI utils such as rebuild_alibi_tensor.

I encountered the error by trying to run a sequence longer than 512. Should the model be loaded another way to reproduce it correctly?

Version and compatibility for Ubuntu

We tried to test DNABERT-2 on AWS EC2 p2.xlarge instance with Ubuntu and CUDA 11.5 and gcc version 9 (and we tried also version 11).
Every attempt failed.
We set the environment exploiting the requirements.txt posted on github but it still no worked.
The trouble has come with the command: hidden_states = model(inputs)[0] # [1, sequence_length, 768]

>>> hidden_states = model(inputs)[0] # [1, sequence_length, 768]
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Traceback (most recent call last):
  File "<string>", line 21, in _fwd_kernel
KeyError: ('2-.-0-.-0--d6252949da17ceb5f3a278a70250af13-3b85c7bef5f0a641282f3b73af50f599-14de7de5c4da5794c8ca14e7e41a122d-3498c340fd4b6ee7805fd54b882a04f5-e1f133f98d04093da2078dfc51c36b72-b26258bf01f839199e39d64851821f26-d7c06e3b46e708006c15224aac7a1378-f585402118c8a136948ce0a49cfe122c', (torch.float16, torch.float16, torch.float16, torch.float16, torch.float16, torch.float32, torch.float32, 'fp32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32'), ('matrix', False, 64, False, False, True, 128, 128), (True, True, True, True, True, True, True, (False,), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (False, False), (False, False), (False, False), (True, False), (True, False), (True, False), (False, False), (False, False), (False, False), (True, False), (True, False), (True, False), (True, False)))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/triton/compiler.py", line 937, in build_triton_ir
    generator.visit(fn.parse())
  File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/triton/compiler.py", line 855, in visit
    return super().visit(node)
  File "/root/miniconda3/envs/dna/lib/python3.8/ast.py", line 371, in visit
    return visitor(node)
  File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/triton/compiler.py", line 183, in visit_Module
    ast.NodeVisitor.generic_visit(self, node)
  File "/root/miniconda3/envs/dna/lib/python3.8/ast.py", line 379, in generic_visit
    self.visit(item)
  File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/triton/compiler.py", line 855, in visit
    return super().visit(node)
  File "/root/miniconda3/envs/dna/lib/python3.8/ast.py", line 371, in visit
    return visitor(node)
  File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/triton/compiler.py", line 252, in visit_FunctionDef
    has_ret = self.visit_compound_statement(node.body)
  File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/triton/compiler.py", line 177, in visit_compound_statement
    self.last_ret_type = self.visit(stmt)
  File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/triton/compiler.py", line 855, in visit
    return super().visit(node)
  File "/root/miniconda3/envs/dna/lib/python3.8/ast.py", line 371, in visit
    return visitor(node)
  File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/triton/compiler.py", line 678, in visit_For
    self.visit_compound_statement(node.body)
  File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/triton/compiler.py", line 177, in visit_compound_statement
    self.last_ret_type = self.visit(stmt)
  File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/triton/compiler.py", line 855, in visit
    return super().visit(node)
  File "/root/miniconda3/envs/dna/lib/python3.8/ast.py", line 371, in visit
    return visitor(node)
  File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/triton/compiler.py", line 319, in visit_AugAssign
    self.visit(assign)
  File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/triton/compiler.py", line 855, in visit
    return super().visit(node)
  File "/root/miniconda3/envs/dna/lib/python3.8/ast.py", line 371, in visit
    return visitor(node)
  File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/triton/compiler.py", line 301, in visit_Assign
    values = self.visit(node.value)
  File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/triton/compiler.py", line 855, in visit
    return super().visit(node)
  File "/root/miniconda3/envs/dna/lib/python3.8/ast.py", line 371, in visit
    return visitor(node)
  File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/triton/compiler.py", line 339, in visit_BinOp
    rhs = self.visit(node.right)
  File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/triton/compiler.py", line 855, in visit
    return super().visit(node)
  File "/root/miniconda3/envs/dna/lib/python3.8/ast.py", line 371, in visit
    return visitor(node)
  File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/triton/compiler.py", line 797, in visit_Call
    return fn(*args, _builder=self.builder, **kws)
  File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/triton/impl/base.py", line 22, in wrapper
    return fn(*args, **kwargs)
TypeError: dot() got an unexpected keyword argument 'trans_b'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/81ac6a98387cf94bc283553260f3fa6b88cef2fa/bert_layers.py", line 608, in forward
    encoder_outputs = self.encoder(
  File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/81ac6a98387cf94bc283553260f3fa6b88cef2fa/bert_layers.py", line 446, in forward
    hidden_states = layer_module(hidden_states,
  File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/81ac6a98387cf94bc283553260f3fa6b88cef2fa/bert_layers.py", line 327, in forward
    attention_output = self.attention(hidden_states, cu_seqlens, seqlen,
  File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/81ac6a98387cf94bc283553260f3fa6b88cef2fa/bert_layers.py", line 240, in forward
    self_output = self.self(input_tensor, cu_seqlens, max_s, indices,
  File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/81ac6a98387cf94bc283553260f3fa6b88cef2fa/bert_layers.py", line 181, in forward
    attention = flash_attn_qkvpacked_func(qkv, bias)
  File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/torch/autograd/function.py", line 506, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/root/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/81ac6a98387cf94bc283553260f3fa6b88cef2fa/flash_attn_triton.py", line1021, in forward
    o, lse, ctx.softmax_scale = _flash_attn_forward(
  File "/root/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/81ac6a98387cf94bc283553260f3fa6b88cef2fa/flash_attn_triton.py", line826, in _flash_attn_forward
    _fwd_kernel[grid](  # type: ignore
  File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/triton/runtime/autotuner.py", line 90, in run
    return self.fn.run(*args, num_warps=config.num_warps, num_stages=config.num_stages, **kwargs, **config.kwargs)
  File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/triton/runtime/autotuner.py", line 199, in run
    return self.fn.run(*args, **kwargs)
  File "<string>", line 41, in _fwd_kernel
  File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/triton/compiler.py", line 1621, in compile
    next_module = compile(module)
  File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/triton/compiler.py", line 1550, in <lambda>
    lambda src: ast_to_ttir(src, signature, configs[0], constants)),
  File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/triton/compiler.py", line 962, in ast_to_ttir
    mod, _ = build_triton_ir(fn, signature, specialization, constants)
  File "/root/miniconda3/envs/dna/lib/python3.8/site-packages/triton/compiler.py", line 942, in build_triton_ir
    raise CompilationError(fn.src, node) from e
triton.compiler.CompilationError: at 114:24:
def _fwd_kernel(
    Q,
    K,
    V,
    Bias,
    Out,
    Lse,
    TMP,  # NOTE: TMP is a scratchpad buffer to workaround a compiler bug
    softmax_scale,
    stride_qb,
    stride_qh,
    stride_qm,
    stride_kb,
    stride_kh,
    stride_kn,
    stride_vb,
    stride_vh,
    stride_vn,
    stride_bb,
    stride_bh,
    stride_bm,
    stride_ob,
    stride_oh,
    stride_om,
    nheads,
    seqlen_q,
    seqlen_k,
    seqlen_q_rounded,
    headdim,
    CACHE_KEY_SEQLEN_Q,
    CACHE_KEY_SEQLEN_K,
    BIAS_TYPE: tl.constexpr,
    IS_CAUSAL: tl.constexpr,
    BLOCK_HEADDIM: tl.constexpr,
    EVEN_M: tl.constexpr,
    EVEN_N: tl.constexpr,
    EVEN_HEADDIM: tl.constexpr,
    BLOCK_M: tl.constexpr,
    BLOCK_N: tl.constexpr,
):
    start_m = tl.program_id(0)
    off_hb = tl.program_id(1)
    off_b = off_hb // nheads
    off_h = off_hb % nheads
    # off_b = tl.program_id(1)
    # off_h = tl.program_id(2)
    # off_hb = off_b * nheads + off_h
    # initialize offsets
    offs_m = start_m * BLOCK_M + tl.arange(0, BLOCK_M)
    offs_n = tl.arange(0, BLOCK_N)
    offs_d = tl.arange(0, BLOCK_HEADDIM)
    # Initialize pointers to Q, K, V
    # Adding parenthesis around indexing might use int32 math instead of int64 math?
    # https://github.com/openai/triton/issues/741
    # I'm seeing a tiny bit of difference (5-7us)
    q_ptrs = Q + off_b * stride_qb + off_h * stride_qh + (
        offs_m[:, None] * stride_qm + offs_d[None, :])
    k_ptrs = K + off_b * stride_kb + off_h * stride_kh + (
        offs_n[:, None] * stride_kn + offs_d[None, :])
    v_ptrs = V + off_b * stride_vb + off_h * stride_vh + (
        offs_n[:, None] * stride_vn + offs_d[None, :])
    if BIAS_TYPE == 'vector':
        b_ptrs = Bias + off_b * stride_bb + off_h * stride_bh + offs_n
    elif BIAS_TYPE == 'matrix':
        b_ptrs = Bias + off_b * stride_bb + off_h * stride_bh + (
            offs_m[:, None] * stride_bm + offs_n[None, :])
    else:
        raise ValueError("BIAS_TYPE must be one of {'vector', 'matrix'}")
    # initialize pointer to m and l
    t_ptrs = TMP + off_hb * seqlen_q_rounded + offs_m
    lse_i = tl.zeros([BLOCK_M], dtype=tl.float32) - float('inf')
    m_i = tl.zeros([BLOCK_M], dtype=tl.float32) - float('inf')
    acc_o = tl.zeros([BLOCK_M, BLOCK_HEADDIM], dtype=tl.float32)
    # load q: it will stay in SRAM throughout
    # [2022-10-30] TD: Triton bug - in the case of EVEN_M=True and EVEN_N=False, if we just call
    # tl.load(q_ptrs), we get the wrong output!
    if EVEN_M & EVEN_N:
        if EVEN_HEADDIM:
            q = tl.load(q_ptrs)
        else:
            q = tl.load(q_ptrs, mask=offs_d[None, :] < headdim, other=0.0)
    else:
        if EVEN_HEADDIM:
            q = tl.load(q_ptrs, mask=offs_m[:, None] < seqlen_q, other=0.0)
        else:
            q = tl.load(q_ptrs,
                        mask=(offs_m[:, None] < seqlen_q) &
                        (offs_d[None, :] < headdim),
                        other=0.0)
    # loop over k, v and update accumulator
    end_n = seqlen_k if not IS_CAUSAL else tl.minimum(
        (start_m + 1) * BLOCK_M, seqlen_k)
    for start_n in range(0, end_n, BLOCK_N):
        start_n = tl.multiple_of(start_n, BLOCK_N)
        # -- compute qk ----
        if EVEN_N & EVEN_M:  # If we just do "if EVEN_N", there seems to be some race condition
            if EVEN_HEADDIM:
                k = tl.load(k_ptrs + start_n * stride_kn)
            else:
                k = tl.load(k_ptrs + start_n * stride_kn,
                            mask=offs_d[None, :] < headdim,
                            other=0.0)
        else:
            if EVEN_HEADDIM:
                k = tl.load(k_ptrs + start_n * stride_kn,
                            mask=(start_n + offs_n)[:, None] < seqlen_k,
                            other=0.0)
            else:
                k = tl.load(k_ptrs + start_n * stride_kn,
                            mask=((start_n + offs_n)[:, None] < seqlen_k) &
                            (offs_d[None, :] < headdim),
                            other=0.0)
        qk = tl.zeros([BLOCK_M, BLOCK_N], dtype=tl.float32)
        qk += tl.dot(q, k, trans_b=True)
                        ^
>>>

Embedding long sequences

Hi,

Thanks for making DNABERT2 available!

I want to prepare embeddings of potentially long sequences for downstream use. How would you recommend I do that?

A) Taking the sequence as-is and embedding all at once ( I guess it is technically possible with ALiBi ?)
B) Chunking the sequence into smaller pieces as done for pretraining, and then concatenating the embeddings (128 nucleotides or 128 BPE tokens? Not sure)

Would appreciate your help!

RuntimeError: Triton Error [CUDA]: invalid argument by run_dnabert2.sh

What is the problem and the solution??

The provided data_path is /home/shiro/DNABERT_2/finetune
2023-08-31 17:57:18.856636: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib:/usr/local/cuda/lib64:
2023-08-31 17:57:18.856685: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
WARNING:root:Perform single sequence classification...
WARNING:root:Perform single sequence classification...
WARNING:root:Perform single sequence classification...
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Some weights of the model checkpoint at zhihan1996/DNABERT-2-117M were not used when initializing BertForSequenceClassification: ['cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.decoder.bias']

This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at zhihan1996/DNABERT-2-117M and are newly initialized: ['classifier.weight', 'bert.pooler.dense.weight', 'bert.pooler.dense.bias', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Using cuda_amp half precision backend
***** Running training *****
Num examples = 36,496
Num Epochs = 5
Instantaneous batch size per device = 8
Total train batch size (w. parallel, distributed & accumulation) = 32
Gradient Accumulation steps = 4
Total optimization steps = 5,700
Number of trainable parameters = 117,070,851
0%| | 0/5700 [00:00<?, ?it/s]Traceback (most recent call last):
File "", line 21, in _bwd_kernel
KeyError: ('2-.-0-.-0-1e8410f206c822547fb50e2ea86e45a6-2b0c5161c53c71b37ae20a9996ee4bb8-c1f92808b4e4644c1732e8338187ac87-42648570729a4835b21c1c18cebedbfe-12f7ac1ca211e037f62a7c0c323d9990-5c5e32ff210f3b7f56c98ca29917c25e-06f0df2d61979d629033f4a22eff5198-0dd03b0bd512a184b3512b278d9dfa59-d35ab04ae841e2714a253c523530b071', (torch.float16, torch.float16, torch.float16, torch.float32, torch.float16, torch.float32, torch.float16, torch.float16, torch.float32, torch.float32, 'fp32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32'), ('matrix', False, 64, False, False, False, True, 128, 128), (True, True, True, True, True, True, True, True, True, True, (False,), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (False, False), (True, False), (True, False), (True, False), (True, False), (False, False), (False, False)))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "train_modified.py", line 332, in
train()
File "train_modified.py", line 314, in train
trainer.train()
File "/home/shiro/miniconda3/envs/dnabert2/lib/python3.8/site-packages/transformers/trainer.py", line 1664, in train
return inner_training_loop(
File "/home/shiro/miniconda3/envs/dnabert2/lib/python3.8/site-packages/transformers/trainer.py", line 1940, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/home/shiro/miniconda3/envs/dnabert2/lib/python3.8/site-packages/transformers/trainer.py", line 2745, in training_step
self.scaler.scale(loss).backward()
File "/home/shiro/miniconda3/envs/dnabert2/lib/python3.8/site-packages/torch/_tensor.py", line 487, in backward
torch.autograd.backward(
File "/home/shiro/miniconda3/envs/dnabert2/lib/python3.8/site-packages/torch/autograd/init.py", line 197, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
File "/home/shiro/miniconda3/envs/dnabert2/lib/python3.8/site-packages/torch/autograd/function.py", line 267, in apply
return user_fn(self, *args)
File "/home/shiro/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/81ac6a98387cf94bc283553260f3fa6b88cef2fa/flash_attn_triton.py", line 1041, in backward
_flash_attn_backward(do,
File "/home/shiro/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/81ac6a98387cf94bc283553260f3fa6b88cef2fa/flash_attn_triton.py", line 949, in _flash_attn_backward
_bwd_kernel[grid]( # type: ignore
File "/home/shiro/miniconda3/envs/dnabert2/lib/python3.8/site-packages/triton/runtime/jit.py", line 106, in launcher
return self.run(*args, grid=grid, **kwargs)
File "/home/shiro/miniconda3/envs/dnabert2/lib/python3.8/site-packages/triton/runtime/autotuner.py", line 73, in run
timings = {config: self._bench(*args, config=config, **kwargs)
File "/home/shiro/miniconda3/envs/dnabert2/lib/python3.8/site-packages/triton/runtime/autotuner.py", line 73, in
timings = {config: self._bench(*args, config=config, **kwargs)
File "/home/shiro/miniconda3/envs/dnabert2/lib/python3.8/site-packages/triton/runtime/autotuner.py", line 63, in _bench
return do_bench(kernel_call)
File "/home/shiro/miniconda3/envs/dnabert2/lib/python3.8/site-packages/triton/testing.py", line 140, in do_bench
fn()
File "/home/shiro/miniconda3/envs/dnabert2/lib/python3.8/site-packages/triton/runtime/autotuner.py", line 62, in kernel_call
self.fn.run(*args, num_warps=config.num_warps, num_stages=config.num_stages, **current)
File "/home/shiro/miniconda3/envs/dnabert2/lib/python3.8/site-packages/triton/runtime/autotuner.py", line 200, in run
return self.fn.run(*args, **kwargs)
File "", line 43, in _bwd_kernel
RuntimeError: Triton Error [CUDA]: invalid argument
0%| | 0/5700 [00:00<?, ?it/s

Which license is applied for DNABERT2

Dear Zhihan,

Which license is applied for DNABERT2 , Apache 2.0 License or any other versions?

Thanks.

Shicheng

Issue regarding the config_class

ValueError Traceback (most recent call last)
in <cell line: 1>()
----> 1 model = AutoModel.from_pretrained("zhihan1996/DNABERT-2-117M", trust_remote_code=True)

1 frames
/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py in register(cls, config_class, model_class, exist_ok)
534 """
535 if hasattr(model_class, "config_class") and model_class.config_class != config_class:
--> 536 raise ValueError(
537 "The model class you are passing has a config_class attribute that is not consistent with the "
538 f"config class you passed (model has {model_class.config_class} and you passed {config_class}. Fix "

ValueError: The model class you are passing has a config_class attribute that is not consistent with the config class you passed (model has <class 'transformers.models.bert.configuration_bert.BertConfig'> and you passed <class 'transformers_modules.zhihan1996.DNABERT-2-117M.81ac6a98387cf94bc283553260f3fa6b88cef2fa.configuration_bert.BertConfig'>. Fix one of those so they match!

Can you please help me out

CompilationError: at 114:24:

Epoch [1/3]

KeyError Traceback (most recent call last)
File :21, in _fwd_kernel(Q, K, V, Bias, Out, Lse, TMP, softmax_scale, stride_qb, stride_qh, stride_qm, stride_kb, stride_kh, stride_kn, stride_vb, stride_vh, stride_vn, stride_bb, stride_bh, stride_bm, stride_ob, stride_oh, stride_om, nheads, seqlen_q, seqlen_k, seqlen_q_rounded, headdim, CACHE_KEY_SEQLEN_Q, CACHE_KEY_SEQLEN_K, BIAS_TYPE, IS_CAUSAL, BLOCK_HEADDIM, EVEN_M, EVEN_N, EVEN_HEADDIM, BLOCK_M, BLOCK_N, grid, num_warps, num_stages, extern_libs, stream, warmup)

KeyError: ('2-.-0-.-0--d6252949da17ceb5f3a278a70250af13-3b85c7bef5f0a641282f3b73af50f599-14de7de5c4da5794c8ca14e7e41a122d-3498c340fd4b6ee7805fd54b882a04f5-e1f133f98d04093da2078dfc51c36b72-b26258bf01f839199e39d64851821f26-d7c06e3b46e708006c15224aac7a1378-f585402118c8a136948ce0a49cfe122c', (torch.float16, torch.float16, torch.float16, torch.float16, torch.float16, torch.float32, torch.float32, 'fp32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32'), ('matrix', False, 64, False, False, True, 128, 128), (True, True, True, True, True, True, True, (False,), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (False, False), (False, False), (False, False), (True, False), (True, False), (True, False), (False, False), (False, False), (False, False), (True, False), (True, False), (False, False), (False, False)))

During handling of the above exception, another exception occurred:

TypeError Traceback (most recent call last)
File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/triton/compiler.py:937, in build_triton_ir(fn, signature, specialization, constants)
936 try:
--> 937 generator.visit(fn.parse())
938 except Exception as e: