Hello, first of all, thank you for your open-source contributions and the detailed REA

Some doubts about downstream tasks about hyena-dna HOT 4 CLOSED

hazyresearch commented on September 16, 2024

Some doubts about downstream tasks

from hyena-dna.

Comments (4)

exnx commented on September 16, 2024

Not sure what you mean DNA tokenization as a task. We just treat every character as a token (ie the smallest unit of data fed into the model) here, so there's nothing to learn.

from hyena-dna.

zhguo9 commented on September 16, 2024

Not sure what you mean DNA tokenization as a task. We just treat every character as a token (ie the smallest unit of data fed into the model) here, so there's nothing to learn.

Sorry for my unclear expression！

“DNA tokenization” means segment DNA sequence into some words . For example , in English , we segment "howareyou" into "how"、 “are" 、 ”you" . In terms of DNA , we segment "AGCTAGCT" into "AGC" 、"TAGCT", 2 wrods .

I want to break long DNA sequence into meaningful words , in order to find the secrect of non-coding regions of DNA.

i already know what you mean for "treat every character as a token". But i am still not sure whether your model can fit this specific task?

from hyena-dna.

exnx commented on September 16, 2024

People usually use byte pair encoding tokenizers to learn meaningful aggregating of characters (in natural language). It's based on frequency of the subwords though, not semantics.

I don't know how you would do that here. I'm guessing there are DNA motif finding algorithms, but I wouldn't know where to begin, sorry.

from hyena-dna.

zhguo9 commented on September 16, 2024

People usually use byte pair encoding tokenizers to learn meaningful aggregating of characters (in natural language). It's based on frequency of the subwords though, not semantics.

I don't know how you would do that here. I'm guessing there are DNA motif finding algorithms, but I wouldn't know where to begin, sorry.

ok，thanks for your time and patient again！ I will explore more about this repo and other methods.

thanks again！

from hyena-dna.

Recommend Projects

Some doubts about downstream tasks about hyena-dna HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent