jzhoulab / ddsm Goto Github PK

View Code? Open in Web Editor NEW

44.0 44.0 7.0 8.21 MB

Dirichlet Diffusion Score Model for Biological Sequence Generation.

License: Other

Python 3.54% Jupyter Notebook 96.46%

ddsm's People

Contributors

Stargazers

Watchers

Forkers

menibrief riavinod hongruhu rongqingyuan guillaumehu mkarikom yuhaotan2

ddsm's Issues

Specific PyTorch Version

Which specific version of PyTorch do you recommend? The official Selene documentation only supports up to PyTorch 1.9. I have been encountering issues with installing Selene myself. Therefore, I would like to know which version of PyTorch you are using.

What is the difference between DDSM and VQ-Diffusion ?

memmap conversion issue

Hi Pavel,

I installed selene-sdk==0.5.3 and downloaded the data as guided. However, when I run the file make_genome_memmap.py, the following error pops out:

ModuleNotFoundError: No module named 'selene_utils'

I realize it comes from from selene_utils import MemmapGenome, so I tried to independently install selene_utils package. But I didn't find it. Could you provide any guidance on what was going on?

Jax version

Hello,

In the README you state

The Jax version of the code will be published soon.

Is there any update or timeline regarding this Jax release? Thanks!

how to control GPU allocations in high-dimensional pre-sampling?

Hi,
How can I control how much GPU ram is allocated during pre-sampling?
I've noticed that pre-sampling more than 4-5 dimensional categorical needs a lot of memory.
For instance, although the 2 and 4 dimensional examples (promoter and bernoulli) run fine,
I get the following error when running the sudoku (9-dim) presampling on a 24GB gpu:

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 5.96 GiB. GPU 0 has a total capacty of 23.65 GiB of which 5.27 GiB is free. Process 643229 has 18.34 GiB memory in use. Of the allocated memory 17.89 GiB is allocated by PyTorch, and 9.78 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Confused about discretization

Thanks for the great paper and code :)

I am confused about how the continuous $x$ values get mapped to discrete values during the reverse diffusion process for Univariate Jacobi Diffusion:

taken from your paper here.

Or is it that the diffusion technically occurs over discrete probability distributions (which are obtained as samples from the reverse diffusion process)? And the final categorical values are obtained by again sampling from the sampled discrete probability distributions?

Also, any update on the JAX code would be very much appreciated :)

Promoter dataset source

Thank you for the interesting paper!

I am having trouble understanding which exact files you used from the FANTOM5 database and how you converted them to the files provided on the ZENODO platform. This was not made clear in the paper, and in the code it is also not stated, as far as I see.
Could you please add some information in the readme about which files you used from the FANTOM5 database and the code you used to preprocess these files?

Thank you in advance!

The performance of DDSM for unconditional DNA generation

Dear Team,

I have been working on developing the generative model for DNA sequences. For a fair comparison, I compare different algorithms in the unconditional generation case. It seems that DDSM fails to capture the motif distribution in the unconditional DNA sequence generation case. By unconditional generation, I mean the transcription profile is not supported as conditions.

I wonder if you have tried to use DDSM for unconditional DNA sequence generation and what is the expected result.

PS: I tried both time dilation and without dilation, and the generated samples don't seem to be capturing the motif distribution of input sequences. The training script is available.

Best,
Zehui

jzhoulab / ddsm Goto Github PK

ddsm's People

Contributors

Stargazers

Watchers

Forkers

ddsm's Issues

Specific PyTorch Version

What is the difference between DDSM and VQ-Diffusion ?

memmap conversion issue

Jax version

how to control GPU allocations in high-dimensional pre-sampling?

Confused about discretization

Promoter dataset source

The performance of DDSM for unconditional DNA generation

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent