millanp95 / delucs Goto Github PK
View Code? Open in Web Editor NEWThis repository contains all the source files required to run DeLUCS, a deep learning clustering algorithm for DNA sequences.
This repository contains all the source files required to run DeLUCS, a deep learning clustering algorithm for DNA sequences.
Dear authors,
Your work is very valuable. I am a doctor worked in a hospital, not very familiar to the code.
Here is a question, how to know which specific sequence contributed to the classification?
Thanks very much.
Hi,
Thanks for developing this tool!
While running EvaluateDeLUCS.py on test files (test 1) I got following error message:
Traceback (most recent call last):
File "EvaluateDeLUCS.py", line 3, in
import torch
File "/src/python3/envs/delucs/lib/python3.7/site-packages/torch/init.py", line 84, in
from torch._C import *
ImportError: /src/python3/envs/delucs/lib/python3.7/site-packages/torch/lib/libmkldnn.so.0: undefined symbol: cblas_sgemm_alloc
I tried to install mkl with intel (through conda install) and I am still getting the same error message.
Thanks,
Best regards,
Gautam
In the paper you specify explicitly the clusterning of DNA sequences. Can DeLUCS be used for clustering of non-DNA sequences, for example sequences of RNA viruses, or only specific genes?
TrainDeLUCS.py line 116
SingleRun.py line 114
parser.add_argument('--n_custers', action='store', type=int, default=0)
typo: n_custers => n_clusters
Hi.
I'm excited to try this tool out. Just wondering how it handles N's?
Liam
HI,
I'm trying to test your tool on some COVID seqs downloaded from gisaid. I put 500 seqs in fasta format in a folder called 'fas', I got the error 'File name too long' so I just named them 1-500, but this error persists.
Just to be clear; I have a folder called fas. In it are 500 fastas, called 1.fa, 2,fa etc.
They are formatted as such:
head fas/1.fa
>1.B_1
ACTTTCGATCTCTTGTAGATCTGTTCTCTAAACGAACTTTAAAATCTGTGTGGCTGTCAC
TCGGCTGCATGCTTAGTGCACTCACGCAGTATAATTAATAACTAATTACTGTCGTTGACA
I ran them as below (on ubuntu with Python 3.8.5) and got the below error:
build_dp.py --data_path = fas
/home/binfie1/binfiebin/DeLUCS/src/build_dp.py: line 12:
This script builds a dataset in pickle
format from a folder with FASTA files. The
desired label of the file must be in the file ID
after the accession number separated by a dot.
:param dataset: Name of the Dataset.
:param data_path: Path of the folder with the sequences.
:returns: None
Example: python build_dp.py --data_path = '../data/Influenza'
: File name too long
/home/binfie1/binfiebin/DeLUCS/src/build_dp.py: line 14: import: command not found
from: can't read /var/mail/Bio
/home/binfie1/binfiebin/DeLUCS/src/build_dp.py: line 16: import: command not found
/home/binfie1/binfiebin/DeLUCS/src/build_dp.py: line 17: import: command not found
/home/binfie1/binfiebin/DeLUCS/src/build_dp.py: line 20: syntax error near unexpected token `('
/home/binfie1/binfiebin/DeLUCS/src/build_dp.py: line 20: `def replace(seq):'
It seems as though there are multiple import errors.
Any help would be appreciated,
Liam
HI,
getting is error:
python TrainDeLUCS.py --data_dir=/home/binfie1/liam_dev/delucs/pairs --out_dir=/home/binfie1/liam_dev/delucs/train1
Traceback (most recent call last):
File "TrainDeLUCS.py", line 193, in <module>
main()
File "TrainDeLUCS.py", line 127, in main
x_train, x_test, y_test = pickle.load(open(filename, 'rb'))
NotADirectoryError: [Errno 20] Not a directory: '/home/binfie1/liam_dev/delucs/pairs/testing_data.p'
Seems to come from trying to open the output from get_pairs.py, which ran as.
python get_pairs.py --data_path=/home/binfie1/liam_dev/delucs/train --k=6 --modify='mutation' --output=/home/binfie1/liam_dev/delucs/pairs
............computing learning pairs................
......saving mutated pairs.....
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.