leaemiliepradier / plasforest Goto Github PK
View Code? Open in Web Editor NEWA random forest classifier to identify contigs of plasmid origin in contig and scaffold genomes
License: GNU General Public License v3.0
A random forest classifier to identify contigs of plasmid origin in contig and scaffold genomes
License: GNU General Public License v3.0
PlasForest: a homology-based random forest classifier for plasmid identification.
(C) Lea Pradier, Tazzio Tissot, Anna-Sophie Fiston-Lavier, Stephanie Bedhomme. 2020.
Traceback (most recent call last):
File "PlasForest.py", line 311, in
main(sys.argv[1:])
File "PlasForest.py", line 147, in main
finalfile = plasforest_predict(features, showFeatures, besthits, verbose, attributed_IDs, attributed_identities, nthreads)
File "PlasForest.py", line 289, in plasforest_predict
plasforest.n_jobs = int(nthreads)
NameError: name 'plasforest' is not defined
Hi!
Could you please provide guidance to install in conda environment, please? I have tried but, after pip install all python dependencies and conda install -c bioconda blast, could not download the database of plasmid sequences. Thanks.
hi !
when i "bash database_downloader.sh "
"All sequences were downloaded correctly. Good!
Program finished without error."
but also a line:
"database_downloader.sh: line 41: 24585 Segmentation fault makeblastdb -in plasmid_refseq.fasta -dbtype nucl -parse_seqids"
is that a error?
hi!
do you know what the issue is?
Traceback (most recent call last):
File "train_plasforest.py", line 174, in
main(sys.argv[1:])
File "train_plasforest.py", line 103, in main
blast_launcher(inputfile, blast_table, verbose, nthreads, database)
File "train_plasforest.py", line 126, in blast_launcher
stdout, stderr = blastn_cline()
File "/root/.local/lib/python3.6/site-packages/Bio/Application/init.py", line 574, in call
raise ApplicationError(return_code, str(self), stdout_str, stderr_str)
Bio.Application.ApplicationError: Non-zero return code 2 from 'blastn -out /root/PlasForest/test.fasta_blast.out -outfmt 6 -query /root/PlasForest/test.fasta -db plasmid_refseq.fasta -evalue 0.001 -num_threads 1', message 'BLAST Database error: No alias or index file found for nucleotide database [plasmid_refseq.fasta] in search path [/root/PlasForest::]'
/mibi/Wanli/anaconda/envs/plasplinev1.4.1/lib/python3.9/site-packages/Bio/Entrez/Parser.py:903: UserWarning: Failed to save epost.dtd at /usr/local/home/hsv709/.config/biopython/Bio/Entrez/DTDs/epost.dtd
warnings.warn("Failed to save %s at %s" % (filename, path))
Traceback (most recent call last):
File "/mibi/users/Wanli/test_plasplinev1.4.1/Plaspline/db/db/plasforest/check_and_download_database.py", line 95, in
download_missing(list_missing, email)
File "/mibi/users/Wanli/test_plasplinev1.4.1/Plaspline/db/db/plasforest/check_and_download_database.py", line 77, in download_missing
result = Entrez.read(request)
File "/mibi/Wanli/anaconda/envs/plasplinev1.4.1/lib/python3.9/site-packages/Bio/Entrez/init.py", line 508, in read
record = handler.read(handle)
File "/mibi/Wanli/anaconda/envs/plasplinev1.4.1/lib/python3.9/site-packages/Bio/Entrez/Parser.py", line 304, in read
self.parser.ParseFile(handle)
File "/home/conda/feedstock_root/build_artifacts/python-split_1653669926144/work/Modules/pyexpat.c", line 459, in EndElement
File "/mibi/Wanli/anaconda/envs/plasplinev1.4.1/lib/python3.9/site-packages/Bio/Entrez/Parser.py", line 666, in endErrorElementHandler
raise RuntimeError(value)
RuntimeError: Some IDs have invalid value and were omitted. Maximum ID value 18446744073709551615
Dear,
i have installed plasforest in a cluster and i wanted to test installation but i get an error.
i don't know how to fix it.
can you please help me to resolve the issue?
please find attached the file error
slurm-52074819.out.zip
cordially
Azim
Running test_plasforest.sh
errored out:
$ ./test_plasforest.sh
Starting to test your PlasForest install...
Checking if all the files are here.... OK
We now run PlasForest on the test dataset... File "PlasForest.py", line 6
SyntaxError: Non-ASCII character '\xc3' in file PlasForest.py on line 6, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details
We now run PlasForest on the test dataset... ERROR
I'm running python 3.7.2, I get this same error whether I keep the PlasForest.py shebang as-is or change to /usr/bin/env python3
Hi. I'm having a problem. When I try to run my command (python3 PlasForest.py -i /home/pedro/Guaymas_C/Analyses/GuaymasC.fasta) or the test_plasforest.sh script, I receive the same error:
I'm using Python version 3.6.0
Traceback (most recent call last):
File "PlasForest.py", line 28, in <module>
import pandas as pd
File "/home/pedro/miniconda3/envs/PlastForest_env/lib/python3.6/site-packages/pandas/__init__.py", line 121, in <module>
from pandas.core.computation.api import eval
File "/home/pedro/miniconda3/envs/PlastForest_env/lib/python3.6/site-packages/pandas/core/computation/api.py", line 3, in <module>
from pandas.core.computation.eval import eval
File "/home/pedro/miniconda3/envs/PlastForest_env/lib/python3.6/site-packages/pandas/core/computation/eval.py", line 12, in <module>
from pandas.core.computation.engines import _engines
File "/home/pedro/miniconda3/envs/PlastForest_env/lib/python3.6/site-packages/pandas/core/computation/engines.py", line 9, in <module>
from pandas.core.computation.ops import _mathops, _reductions
File "/home/pedro/miniconda3/envs/PlastForest_env/lib/python3.6/site-packages/pandas/core/computation/ops.py", line 19, in <module>
from pandas.core.computation.scope import _DEFAULT_GLOBALS
File "/home/pedro/miniconda3/envs/PlastForest_env/lib/python3.6/site-packages/pandas/core/computation/scope.py", line 17, in <module>
from pandas.compat.chainmap import DeepChainMap
File "/home/pedro/miniconda3/envs/PlastForest_env/lib/python3.6/site-packages/pandas/compat/chainmap.py", line 1, in <module>
from typing import ChainMap, MutableMapping, TypeVar, cast
ImportError: cannot import name 'ChainMap'
Hi,
Thank you for the awsome package. I would like to try it on my bacterial metagenomic sequences. However, when i try it with the test data I get the following error:
conda activate plasforest-1.4
bash test_plasforest.sh
bash test_plasforest.sh
Starting to test your PlasForest install...
Checking if all the files are here.... OK
We now run PlasForest on the test dataset...Traceback (most recent call last):
File "PlasForest.py", line 27, in <module>
from sklearn.ensemble import RandomForestClassifier
File "/home/user/.local/lib/python3.8/site-packages/sklearn/ensemble/__init__.py", line 7, in <module>
from ._forest import RandomForestClassifier
File "/home/user/.local/lib/python3.8/site-packages/sklearn/ensemble/_forest.py", line 56, in <module>
from ..tree import (DecisionTreeClassifier, DecisionTreeRegressor,
File "/home/user/.local/lib/python3.8/site-packages/sklearn/tree/__init__.py", line 6, in <module>
from ._classes import BaseDecisionTree
File "/home/user/.local/lib/python3.8/site-packages/sklearn/tree/_classes.py", line 40, in <module>
from ._criterion import Criterion
File "sklearn/tree/_splitter.pxd", line 34, in init sklearn.tree._criterion
File "sklearn/tree/_tree.pxd", line 37, in init sklearn.tree._splitter
File "sklearn/neighbors/_quad_tree.pxd", line 55, in init sklearn.tree._tree
File "/home/user/.local/lib/python3.8/site-packages/sklearn/neighbors/__init__.py", line 17, in <module>
from ._nca import NeighborhoodComponentsAnalysis
File "/home/user/.local/lib/python3.8/site-packages/sklearn/neighbors/_nca.py", line 22, in <module>
from ..decomposition import PCA
File "/home/user/.local/lib/python3.8/site-packages/sklearn/decomposition/__init__.py", line 17, in <module>
from .dict_learning import dict_learning
File "/home/user/.local/lib/python3.8/site-packages/sklearn/decomposition/dict_learning.py", line 4, in <module>
from . import _dict_learning
File "/home/user/.local/lib/python3.8/site-packages/sklearn/decomposition/_dict_learning.py", line 21, in <module>
from ..linear_model import Lasso, orthogonal_mp_gram, LassoLars, Lars
File "/home/user/.local/lib/python3.8/site-packages/sklearn/linear_model/__init__.py", line 12, in <module>
from ._least_angle import (Lars, LassoLars, lars_path, lars_path_gram, LarsCV,
File "/home/user/.local/lib/python3.8/site-packages/sklearn/linear_model/_least_angle.py", line 30, in <module>
method='lar', copy_X=True, eps=np.finfo(np.float).eps,
File "/home/user/.local/lib/python3.8/site-packages/numpy/__init__.py", line 305, in __getattr__
raise AttributeError(__former_attrs__[attr])
AttributeError: module 'numpy' has no attribute 'float'.
`np.float` was a deprecated alias for the builtin `float`. To avoid this error in existing code, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
We now run PlasForest on the test dataset... ERROR(plasforest-1.4)[user@user1 PlasForest-1.4]$
it may be due to compatibility issue. scikit-learn=0.22.2 requires NumPy (>= 1.11.0)
hi!
do you know what the issue is?
Traceback (most recent call last):
File "/home/projects/ku_00041/apps/wanli/F_pipeline/db/plasforest/PlasForest.py", line 236, in
main(sys.argv[1:])
File "/home/projects/ku_00041/apps/wanli/F_pipeline/db/plasforest/PlasForest.py", line 114, in main
blast_launcher(tmp_fasta, blast_table, verbose, nthreads)
File "/home/projects/ku_00041/apps/wanli/F_pipeline/db/plasforest/PlasForest.py", line 161, in blast_launcher
stdout, stderr = blastn_cline()
File "/home/projects/ku_00041/apps/wanli/F_pipeline/conda_envs/ceb528a9/lib/python3.8/site-packages/Bio/Application/init.py", line 569, in call
raise ApplicationError(return_code, str(self), stdout_str, stderr_str)
Bio.Application.ApplicationError: Non-zero return code -11 from 'blastn -out assmebly_res/SRR2145291_contigs_1kb.fasta_blast.out -outfmt 6 -query assmebly_res/SRR2145291_contigs_1kb.fasta_tmp.fasta -db plasmid_refseq.fasta -evalue 0.001 -num_threads 30'
Hello,
I have executed database_downloader.sh, but after it downloads all the records it does not finish, but starts the process over again. This is a sample of what is shown in the Linux terminal:
Downloading record 34401 to 34600 of 34701
Downloading record 32201 to 32400 of 34701
Downloading record 33201 to 33400 of 34701
Downloading record 34601 to 34701 of 34701
Downloading record 33401 to 33600 of 34701
Checking for sequences that did not download... Please wait.
Downloading accession 1 to 34701 of 34701
WARNING: Master record found and removed: NZ_CBTO000000000.1.
All sequences were downloaded correctly. Good!
Program finished without error.
Downloading record 4801 to 5000 of 34701
Downloading record 7201 to 7400 of 34701
Downloading record 1201 to 1400 of 34701
Downloading record 3601 to 3800 of 34701
The download stops only if it is manually interrupted and the running test_plasforest.sh returns the following error:
ERROR: You must first download the plasmid database by using database_downloader.sh(plasforest)
Do you know what could be causing this error and how to solve it?
Thanks!
Hello,
I am trying to install the packages in python that are required, but I am getting an error with sci-kit, sometimes is due to incompatibilities with the other packages that are installed without specifying the versions.
Could you please provide the exact versions of the python packages that you have installed?
Thank you.
Best,
Susana
thank you
Hi, I have run the test at the end of the installation and everything seemed to work. Indeed, I am able to call the program, however an error occurs as well:
PlasForest: a homology-based random forest classifier for plasmid identification.
(C) Lea Pradier, Tazzio Tissot, Anna-Sophie Fiston-Lavier, Stephanie Bedhomme. 2020.
Error: cannot find the path to the .sav file
Any idea what is going on or how it can be fixed?
Thanks
Could we have a command line option to set the location of
plasmid_refseq.fasta
and
plasforest.sav
rather than have it hardcoded in a certain location?
Hello!
When I try this :
for file in *.fna; do python ../../../softwares/PlasForest/PlasForest.py -b -i $file -o ${file%%.fna}.csv --threads 8; done
I get this for each fasta file:
Traceback (most recent call last):
File "../../../softwares/PlasForest/PlasForest.py", line 45, in
plasforest = pickle.load(open("plasforest.sav","rb"))
FileNotFoundError: [Errno 2] No such file or directory: 'plasforest.sav'
**the test turned out well.
Help me please.
thanks in advance
Benjamin Leyton
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.