Giter Club home page Giter Club logo

dna2vec's Issues

length longest string I can encode

Hi,
I would like to know what parameters should I use in order to be able to get the vector representation of a string of length 45.
At the moment I can go beyond 25.

Encoding longer sequences

Is there already any implemented function to encode longer sequences (such as sequencing reads) using their k-mers embeddings?

mm10

Do you have a pretained vectors for mm10?

Thanks

Incorrect embedding dimension after training

I want to use dna2vec for E. coli genome.
When I set 2<=k<=8, I got (86479,100);
When I set 3<=k<8, I got (86614,100), and the correct dimension should be (87360,100) that $87360+16=4^2+4^3+4^4+4^5+4^6+4^7+4^8$.
So I don' know why I got 2 different results.
I also check every Kmer from 2 to 8, I find the dimension is correct from 2 to 7.
However, in k=8, the dimension is (64450,100) rather than (65536,100), and $65536-64450 != 87630-86614$.
This is horrible! There is nowhere to match.

dna2vec against large dataset

We are trying to run dna2vec against a large db (the ncbi nt dataset) which has ~47m sequences in it. Do you know of any issues with doing this (aside from it taking a really long time)?

I am seeing that the PROGRESS message report we are on sentence 105m, but we are still on epoch #1. I think we should be on epoch 3 based on the progress messages.

Do you have any thoughts on why this might be the case?

Pre-image / component mapping?

Hi,
Can you please make it explicit how to obtain a pre-image from a mapped vector?
Additionally, can you explain how the components v_j of the vectors in V are related to the sequence components s_i in the sequence space S?

Best wishes

installation/training fails unless run from scripts folder

$ python3 ./scripts/train_dna2vec.py -c configs/small_example.yml
Traceback (most recent call last):
  File "./scripts/train_dna2vec.py", line 12, in <module>
    from attic_util.time_benchmark import Benchmark
ImportError: No module named 'attic_util'

this is executed from ~/dna2vec

The reason for this is intrain_dna2vec.pythe relative path to attic_util and dna2vec are appended to sys.path. Idiosyncratically, python appends the '../' from the folder that the script was called from.

the work around is easy - just call the script from within ./scripts

for cleaner implementation though, it might be better to consider using an egg or some other setup that allows attic_utils and dna2vec to be called from elsewhere

Pretrained set?

Hi,
What genome/sequence was the pretraining set done on? Can you make this available? I am running some initial experiments and would rather not lose time to training dna2vec for my proof of concept.

Thank you!

AttributeError: 'Word2Vec' object has no attribute 'wv'

Describtion:

python3 ./scripts/train_dna2vec.py -c configs/small_example.yml

Then:

File "./scripts/train_dna2vec.py", line 55, in write_vec
    self.model.wv.save_word2vec_format(out_filename, binary=False)
AttributeError: 'Word2Vec' object has no attribute 'wv'

Exception:

Traceback (most recent call last):
  File "./scripts/train_dna2vec.py", line 142, in <module>
    main()
  File "./scripts/train_dna2vec.py", line 139, in main
    run_main(args, inputs, out_fileroot)
  File "./scripts/train_dna2vec.py", line 88, in run_main
    learner.write_vec()
  File "./scripts/train_dna2vec.py", line 55, in write_vec
    self.model.wv.save_word2vec_format(out_filename, binary=False)
AttributeError: 'Word2Vec' object has no attribute 'wv'

env: using pip install -r requirements.txt

# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
_openmp_mutex             4.5                       2_gnu    https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
arrow                     0.8.0                    pypi_0    pypi
biopython                 1.68                     pypi_0    pypi
boto                      2.46.1                   pypi_0    pypi
bz2file                   0.98                     pypi_0    pypi
bzip2                     1.0.8                h7f98852_4    https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
ca-certificates           2022.6.15            ha878542_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
certifi                   2022.6.15                pypi_0    pypi
chardet                   3.0.4                    pypi_0    pypi
configargparse            0.11.0                   pypi_0    pypi
gensim                    0.13.2                   pypi_0    pypi
idna                      2.7                      pypi_0    pypi
ld_impl_linux-64          2.36.1               hea4e1c9_2    https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
libffi                    3.4.2                h7f98852_5    https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
libgcc-ng                 12.1.0              h8d9b700_16    https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
libgomp                   12.1.0              h8d9b700_16    https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
libnsl                    2.0.0                h7f98852_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
libstdcxx-ng              12.1.0              ha89aaad_16    https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
libuuid                   2.32.1            h7f98852_1000    https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
libzlib                   1.2.12               h166bdaf_1    https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
logbook                   1.0.0                    pypi_0    pypi
ncurses                   6.3                  h27087fc_1    https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
numpy                     1.16.0                   pypi_0    pypi
openssl                   1.1.1p               h166bdaf_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
pep8                      1.7.0                    pypi_0    pypi
pip                       21.2.4             pyhd8ed1ab_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
pluggy                    0.4.0                    pypi_0    pypi
py                        1.4.33                   pypi_0    pypi
pytest                    3.0.7                    pypi_0    pypi
python                    3.6.15          hb7a2778_0_cpython    https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
python-dateutil           2.6.0                    pypi_0    pypi
readline                  8.1.2                h0f457ee_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
requests                  2.20.0                   pypi_0    pypi
scipy                     0.19.0                   pypi_0    pypi
setuptools                36.4.0                   py36_1    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
six                       1.10.0                   pypi_0    pypi
smart-open                1.5.1                    pypi_0    pypi
sqlite                    3.39.0               h4ff8645_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
tk                        8.6.12               h27826a3_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
tox                       2.7.0                    pypi_0    pypi
tox-pyenv                 1.0.3                    pypi_0    pypi
tzdata                    2022a                h191b570_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
urllib3                   1.24.3                   pypi_0    pypi
virtualenv                15.1.0                   pypi_0    pypi
wheel                     0.37.1             pyhd8ed1ab_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
xz                        5.2.5                h516909a_1    https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
zlib                      1.2.12               h166bdaf_1    https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.