clb21565 / mobileog-db Goto Github PK

View Code? Open in Web Editor NEW

23.0 2.0 5.0 166 KB

code repo for mobileOG-db

License: GNU General Public License v3.0

Python 54.78% Shell 9.67% R 35.55%

database mobile-genetic-elements phage plasmid transposon integrative-conjugative-elements mge-detection

mobileog-db's People

Contributors

Stargazers

Watchers

Forkers

meisiy anyihu liaochenlanruo balaram26 niicaii

mobileog-db's Issues

Output error

Hello!

I am getting the error below after I run ./mobileOGs-pl-kyanite.sh -i NFEBO18_contigs_1000.fasta -d mobileOG-db-beatrix-1.6.dmnd -m mobileOG-db-beatrix-1.6.All.csv -k 15 -e 1e-20 -p 90 -q 90 > sample.txt.
Can you help me with solving this issue?

Error mesage:

Error: Invalid output field: qtitle
Traceback (most recent call last):
File "/Users/yanack/kadir/Databases/mobileOG-db/mobileOGs-pl-kyanite.py", line 19, in
df_OUT=pd.read_csv(args.i,sep="\t")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/yanack/anaconda3/envs/mamba/envs/mobileOG-db/lib/python3.11/site-packages/pandas/util/_decorators.py", line 211, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/Users/yanack/anaconda3/envs/mamba/envs/mobileOG-db/lib/python3.11/site-packages/pandas/util/_decorators.py", line 331, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/Users/yanack/anaconda3/envs/mamba/envs/mobileOG-db/lib/python3.11/site-packages/pandas/io/parsers/readers.py", line 950, in read_csv
return _read(filepath_or_buffer, kwds)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/yanack/anaconda3/envs/mamba/envs/mobileOG-db/lib/python3.11/site-packages/pandas/io/parsers/readers.py", line 605, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/yanack/anaconda3/envs/mamba/envs/mobileOG-db/lib/python3.11/site-packages/pandas/io/parsers/readers.py", line 1442, in init
self._engine = self._make_engine(f, self.engine)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/yanack/anaconda3/envs/mamba/envs/mobileOG-db/lib/python3.11/site-packages/pandas/io/parsers/readers.py", line 1735, in _make_engine
self.handles = get_handle(
^^^^^^^^^^^
File "/Users/yanack/anaconda3/envs/mamba/envs/mobileOG-db/lib/python3.11/site-packages/pandas/io/common.py", line 856, in get_handle
handle = open(
^^^^^
FileNotFoundError: [Errno 2] No such file or directory: 'NFEBO18_contigs_1000.fasta.tsv'

Thanks,

Product descriptions for functional annotations

First of all, thanks a lot for and congratulations to this great resource of MGE proteins! This is an important effort towards a systematic order of MGE related proteins and functions.

Also, this might be a wonderful resource for functional annotation pipelines like for instance Bakta. However, I could find only gene symbols but no related product descriptions which are required for such purposes. Have I merely overlooked them somewhere or are these actually lacking? And if they are lacking, would it be possible to provide them?

Again, thanks a lot!

Not able to make diamond database

I am stuck at this step.

Make Diamond Database:
diamond makedb --in mobileOG-db-beatrix-1.X.All.faa -d mobileOG-db-beatrix-1.X.dmnd

I am not able to find mobileOG-db-beatrix-1.X.dmnd.
Should I have to first download diamond database to execute above command? or I am doing something wrong?
diamond makedb --in mobileOG-db-beatrix-1.X.All.faa -d mobileOG-db-beatrix-1.X.dmnd
Error is
diamond v2.0.15.153 (C) Max Planck Society for the Advancement of Science
Documentation, support and updates available at http://www.diamondsearch.org
Please cite: http://dx.doi.org/10.1038/s41592-021-01101-x Nature Methods (2021)

#CPU threads: 40
Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1)
Database input file: mobileOG-db-beatrix-1.X.All.faa
Opening the database file... No such file or directory
[0s]
Error: Error calling stat on file mobileOG-db-beatrix-1.X.All.faa
Opening the database file... No such file or directory
[0s]
Please help in solving this error. Thank you.

Error: Protein sequences expected

Hi,

I have tried to run the tool again today but I get a new error that I was not getting before:

"Error: The sequences are expected to be proteins but only contain DNA letters"

How should I know specify that the input fasta is DNA? I use the following parameters as it is described in the manual: -k 15 -e 1e-20 -p 90 -q 90.

Thank you,
Asier

something wrong with the "mobileOG-pl/mobileOGs-pl-kyanite.py"

When I tried to process the output of diamond with py files, the following error is reported and I can't solve it. Hope to get your help, thanks.
python mobileOG-pl/mobileOGs-pl-kyanite.py --o /MobileOG/ --i 1_mobileOG.tsv -m mobileOG-db-beatrix-1.6-All.csv
sys:1: DtypeWarning: Columns (7) have mixed types.Specify dtype option on import or set low_memory=False.

zsh: permission denied:

Hi
I am trying to run final step ./mobileOG-db-main/mobileOG-pl/mobileOGs-pl-kyanite.sh -i S5.fasta -d mobileOG-db-beatrix-1.X.dmnd -m mobileOG-db-beatrix-1.6-All.csv -k 15 -e 1e-20 -p 90 -q 90
but I got this error zsh: permission denied: ./mobileOG-db-main/mobileOG-pl/mobileOGs-pl-kyanite.sh

can you help me please?

thank you

Script names

Hello!

I think I got your pipeline to work with a few modifications. Just checking if I did this correctly and am getting the expected output.

1. python version

In UsageGuidance.md, it says dependencies are python 3.7 with pandas, argparse, and itertools.

So I specified python 3.7 - conda create -n mobileOG-db python=3.7

However, I got this error:

UnsatisfiableError: The following specifications were found
to be incompatible with the existing python installation in your environment -

Specifications:

  - argparse -> python[version='2.6.*|2.7.*|3.4.*|3.5.*|3.6.*']

Your python: python=3.7

Which was fixed by specifying 3.6.15 conda create -n mobileOG-db python=3.6.15

2. unicode decode error

With help from this, I was able to solve these -

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xee in position 2: invalid continuation byte

In mobileOGs-pl-kyanite.py, change line 19 to -

df_OUT=pd.read_csv(args.i,sep="\t",encoding='ascii')

and line 61 to (I could not get it to work with args.m)-

Metadata=pd.read_csv('mobileOG-db-beatrix-1.5.All.csv',encoding='utf-8')

3. script names

I think UsageGuidance.md is not up to date? Is "mobileOG-pl.sh" the same as "mobileOGs-pl-kyanite.sh?"

Within the shell script, mobileOGs-pl.py is then changed to mobileOGs-pl-kyanite.py?

4. executables

The markdown says

mobileOG-pl.sh -i -d mobileOG-db_beatrix-1.X.dmnd -k 15 -e 1e-20 -p 90 -q 90

So I put my fasta file after -i, and made the script executable (chmod u+x mobileOGs-pl-kyanite.sh). Also, everything prints to the terminal, so I also redirected.

The final command looks like this -

./mobileOGs-pl-kyanite.sh -i test.fasta -d mobileOG-db-beatrix-1.5.All.dmnd -k 15 -e 1e-20 -p 90 -q 90 > test.txt

5. verbose

After redirecting, output to terminal is like this... is there a way of turning it off? (there are many thousands of sequences haha)

...
Finding genes in sequence #32268 (1023 bp)...done!
Finding genes in sequence #32269 (2088 bp)...done!
Finding genes in sequence #32270 (4827 bp)...done!
Finding genes in sequence #32271 (1319 bp)...done!
Finding genes in sequence #32272 (5211 bp)...done!
Finding genes in sequence #32273 (1350 bp)...done!
...

6. output

Finally, the output test.fasta.summary.csv looks like this -

$ head -3 test.fasta.summary.csv

,key_0,Specific Contig_x,Bacteriophages,Insertion sequences,Integrative elements,Multiple,Plasmids,Total Number of Hits,Percent Bacteriophages,Percent Insertion sequences,Percent Integrative elements,Percent Plasmids,Percent Multiple,Specific Contig_y,Amount of Unique ORFs
0,S10B_S4_000000000087,S10B_S4_000000000087,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,100.0,0.0,0.0,S10B_S4_000000000087,1
1,S10B_S4_000000000117,S10B_S4_000000000117,1.0,0.0,0.0,0.0,0.0,1.0,100.0,0.0,0.0,0.0,0.0,S10B_S4_000000000117,1

Is this expected?

Thank you!!

Error when no results found

When running mobileOGs-pl-kyanite.py, if no results are found an error is thrown (below). This may confuse users into thinking there was a problem when the only issue is that no results were found.

The script could check if the file is empty and exit early (around line 15).
df_OUT=pd.read_csv(args.i,sep="\t")

Error (when file empty):

Reported 0 pairwise alignments, 0 HSPs.
0 queries aligned.
Traceback (most recent call last):
File "/app/mobileOG-db/bin/mobileOGs-pl-kyanite.py", line 19, in
df_OUT=pd.read_csv(args.i,sep="\t")
File "/opt/conda/lib/python3.9/site-packages/pandas/util/_decorators.py", line 311, in wrapper
return func(*args, **kwargs)
File "/opt/conda/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 680, in read_csv
return _read(filepath_or_buffer, kwds)
File "/opt/conda/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 575, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/opt/conda/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 933, in init
self._engine = self._make_engine(f, self.engine)
File "/opt/conda/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 1235, in _make_engine
return mapping[engine](f, **self.options)
File "/opt/conda/lib/python3.9/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 75, in init
self._reader = parsers.TextReader(src, **kwds)
File "pandas/_libs/parsers.pyx", line 551, in pandas._libs.parsers.TextReader.cinit
pandas.errors.EmptyDataError: No columns to parse from file

mobileOGs-pl-kyanite.py not working

Since the update yesterday, mobileOGs-pl-kyanite.py no longer works. Line 25 is split on 2 lines.

When I manually fix the split line the program runs but no longer returns the Alignment output file: *.mobileOG.Alignment.Out.csv

Error: target with no hsps

Hi Connor,

I ran the scripts in 2 different databases, and while it worked perfectly in one of them, in the other I got the following error:

Computing alignments... Error: generate_output: target with no hsps.
Empty diamond output. No hits returned from diamond search.

In that case (it is a phage-plasmid database) no output files were generated. Maybe should I try changing some parameters (currently I use -k 15 -e 1e-20 -p 90 -q 90 - although I would like to know which other flags are supported (like --id, etc.)

License?

Hello! Thank you for putting together this database, it is exactly what I need to try to improve the annotation of proteins on predicted MGEs. However, can you set up a license in the repo? Just to be on the safe side.

Thanks in advance!

Alejandra.

Output of mobileOG

Hi,

Thanks for the great database and the scripts provided.

I have a few general questions about the output files and recommendations on how to interpret them:

First, would you recommend to use the full database or only the version containing manually curated + homologue sequences? Is the classification of the remaining proteins reliable (keyword data)? I have tried using your tool in a few contigs using the curated + homology DB and some proteins are given a NA as mobileOG Category (even in some manually curated sequences). Why does this happen?
Also, I would like to have a more detailed explanation of the output files. E.g. contig_file_summary.csv: I do not completely understand the output of this file. I would expect that each row corresponds to one contig (although contig names are not displayed), but in my case there are less rows than contigs.
From a list of potential phage/viral contigs, I am interested in determining which of these contigs could be potential plasmids and mobile elements to discard them, as I want to keep only phage sequences. Which annotations (or how many) should be present in a contig to confidently classify it as a mobile element or as a plasmid?