Giter Club home page Giter Club logo

mobileog-db's People

Contributors

balaram26 avatar clb21565 avatar jamesm224 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

mobileog-db's Issues

Output error

Hello!

I am getting the error below after I run ./mobileOGs-pl-kyanite.sh -i NFEBO18_contigs_1000.fasta -d mobileOG-db-beatrix-1.6.dmnd -m mobileOG-db-beatrix-1.6.All.csv -k 15 -e 1e-20 -p 90 -q 90 > sample.txt.
Can you help me with solving this issue?

Error mesage:

Error: Invalid output field: qtitle
Traceback (most recent call last):
File "/Users/yanack/kadir/Databases/mobileOG-db/mobileOGs-pl-kyanite.py", line 19, in
df_OUT=pd.read_csv(args.i,sep="\t")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/yanack/anaconda3/envs/mamba/envs/mobileOG-db/lib/python3.11/site-packages/pandas/util/_decorators.py", line 211, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/Users/yanack/anaconda3/envs/mamba/envs/mobileOG-db/lib/python3.11/site-packages/pandas/util/_decorators.py", line 331, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/Users/yanack/anaconda3/envs/mamba/envs/mobileOG-db/lib/python3.11/site-packages/pandas/io/parsers/readers.py", line 950, in read_csv
return _read(filepath_or_buffer, kwds)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/yanack/anaconda3/envs/mamba/envs/mobileOG-db/lib/python3.11/site-packages/pandas/io/parsers/readers.py", line 605, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/yanack/anaconda3/envs/mamba/envs/mobileOG-db/lib/python3.11/site-packages/pandas/io/parsers/readers.py", line 1442, in init
self._engine = self._make_engine(f, self.engine)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/yanack/anaconda3/envs/mamba/envs/mobileOG-db/lib/python3.11/site-packages/pandas/io/parsers/readers.py", line 1735, in _make_engine
self.handles = get_handle(
^^^^^^^^^^^
File "/Users/yanack/anaconda3/envs/mamba/envs/mobileOG-db/lib/python3.11/site-packages/pandas/io/common.py", line 856, in get_handle
handle = open(
^^^^^
FileNotFoundError: [Errno 2] No such file or directory: 'NFEBO18_contigs_1000.fasta.tsv'

Thanks,

Product descriptions for functional annotations

First of all, thanks a lot for and congratulations to this great resource of MGE proteins! This is an important effort towards a systematic order of MGE related proteins and functions.

Also, this might be a wonderful resource for functional annotation pipelines like for instance Bakta. However, I could find only gene symbols but no related product descriptions which are required for such purposes. Have I merely overlooked them somewhere or are these actually lacking? And if they are lacking, would it be possible to provide them?

Again, thanks a lot!

Not able to make diamond database

I am stuck at this step.

Make Diamond Database:
diamond makedb --in mobileOG-db-beatrix-1.X.All.faa -d mobileOG-db-beatrix-1.X.dmnd

I am not able to find mobileOG-db-beatrix-1.X.dmnd.
Should I have to first download diamond database to execute above command? or I am doing something wrong?
diamond makedb --in mobileOG-db-beatrix-1.X.All.faa -d mobileOG-db-beatrix-1.X.dmnd
Error is
diamond v2.0.15.153 (C) Max Planck Society for the Advancement of Science
Documentation, support and updates available at http://www.diamondsearch.org
Please cite: http://dx.doi.org/10.1038/s41592-021-01101-x Nature Methods (2021)

#CPU threads: 40
Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1)
Database input file: mobileOG-db-beatrix-1.X.All.faa
Opening the database file... No such file or directory
[0s]
Error: Error calling stat on file mobileOG-db-beatrix-1.X.All.faa
Opening the database file... No such file or directory
[0s]

Please help in solving this error. Thank you.

Error: Protein sequences expected

Hi,

I have tried to run the tool again today but I get a new error that I was not getting before:

"Error: The sequences are expected to be proteins but only contain DNA letters"

How should I know specify that the input fasta is DNA? I use the following parameters as it is described in the manual: -k 15 -e 1e-20 -p 90 -q 90.

Thank you,
Asier

something wrong with the "mobileOG-pl/mobileOGs-pl-kyanite.py"

When I tried to process the output of diamond with py files, the following error is reported and I can't solve it. Hope to get your help, thanks.
python mobileOG-pl/mobileOGs-pl-kyanite.py --o /MobileOG/ --i 1_mobileOG.tsv -m mobileOG-db-beatrix-1.6-All.csv
sys:1: DtypeWarning: Columns (7) have mixed types.Specify dtype option on import or set low_memory=False.

zsh: permission denied:

Hi
I am trying to run final step ./mobileOG-db-main/mobileOG-pl/mobileOGs-pl-kyanite.sh -i S5.fasta -d mobileOG-db-beatrix-1.X.dmnd -m mobileOG-db-beatrix-1.6-All.csv -k 15 -e 1e-20 -p 90 -q 90
but I got this error zsh: permission denied: ./mobileOG-db-main/mobileOG-pl/mobileOGs-pl-kyanite.sh

can you help me please?

thank you

Script names

Hello!

I think I got your pipeline to work with a few modifications. Just checking if I did this correctly and am getting the expected output.

1. python version

In UsageGuidance.md, it says dependencies are python 3.7 with pandas, argparse, and itertools.

So I specified python 3.7 - conda create -n mobileOG-db python=3.7

However, I got this error:

UnsatisfiableError: The following specifications were found
to be incompatible with the existing python installation in your environment -

Specifications:

  - argparse -> python[version='2.6.*|2.7.*|3.4.*|3.5.*|3.6.*']

Your python: python=3.7

Which was fixed by specifying 3.6.15 conda create -n mobileOG-db python=3.6.15

2. unicode decode error

With help from this, I was able to solve these -

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xee in position 2: invalid continuation byte

In mobileOGs-pl-kyanite.py, change line 19 to -

df_OUT=pd.read_csv(args.i,sep="\t",encoding='ascii')

and line 61 to (I could not get it to work with args.m)-

Metadata=pd.read_csv('mobileOG-db-beatrix-1.5.All.csv',encoding='utf-8')

3. script names

I think UsageGuidance.md is not up to date? Is "mobileOG-pl.sh" the same as "mobileOGs-pl-kyanite.sh?"

Within the shell script, mobileOGs-pl.py is then changed to mobileOGs-pl-kyanite.py?

4. executables

The markdown says

mobileOG-pl.sh -i -d mobileOG-db_beatrix-1.X.dmnd -k 15 -e 1e-20 -p 90 -q 90

So I put my fasta file after -i, and made the script executable (chmod u+x mobileOGs-pl-kyanite.sh). Also, everything prints to the terminal, so I also redirected.

The final command looks like this -

./mobileOGs-pl-kyanite.sh -i test.fasta -d mobileOG-db-beatrix-1.5.All.dmnd -k 15 -e 1e-20 -p 90 -q 90 > test.txt

5. verbose

After redirecting, output to terminal is like this... is there a way of turning it off? (there are many thousands of sequences haha)

...
Finding genes in sequence #32268 (1023 bp)...done!
Finding genes in sequence #32269 (2088 bp)...done!
Finding genes in sequence #32270 (4827 bp)...done!
Finding genes in sequence #32271 (1319 bp)...done!
Finding genes in sequence #32272 (5211 bp)...done!
Finding genes in sequence #32273 (1350 bp)...done!
...

6. output

Finally, the output test.fasta.summary.csv looks like this -

$ head -3 test.fasta.summary.csv

,key_0,Specific Contig_x,Bacteriophages,Insertion sequences,Integrative elements,Multiple,Plasmids,Total Number of Hits,Percent Bacteriophages,Percent Insertion sequences,Percent Integrative elements,Percent Plasmids,Percent Multiple,Specific Contig_y,Amount of Unique ORFs
0,S10B_S4_000000000087,S10B_S4_000000000087,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,100.0,0.0,0.0,S10B_S4_000000000087,1
1,S10B_S4_000000000117,S10B_S4_000000000117,1.0,0.0,0.0,0.0,0.0,1.0,100.0,0.0,0.0,0.0,0.0,S10B_S4_000000000117,1

Is this expected?

Thank you!!

Error when no results found

When running mobileOGs-pl-kyanite.py, if no results are found an error is thrown (below). This may confuse users into thinking there was a problem when the only issue is that no results were found.

The script could check if the file is empty and exit early (around line 15).
df_OUT=pd.read_csv(args.i,sep="\t")

Error (when file empty):

Reported 0 pairwise alignments, 0 HSPs.
0 queries aligned.
Traceback (most recent call last):
File "/app/mobileOG-db/bin/mobileOGs-pl-kyanite.py", line 19, in
df_OUT=pd.read_csv(args.i,sep="\t")
File "/opt/conda/lib/python3.9/site-packages/pandas/util/_decorators.py", line 311, in wrapper
return func(*args, **kwargs)
File "/opt/conda/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 680, in read_csv
return _read(filepath_or_buffer, kwds)
File "/opt/conda/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 575, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/opt/conda/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 933, in init
self._engine = self._make_engine(f, self.engine)
File "/opt/conda/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 1235, in _make_engine
return mapping[engine](f, **self.options)
File "/opt/conda/lib/python3.9/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 75, in init
self._reader = parsers.TextReader(src, **kwds)
File "pandas/_libs/parsers.pyx", line 551, in pandas._libs.parsers.TextReader.cinit
pandas.errors.EmptyDataError: No columns to parse from file

mobileOGs-pl-kyanite.py not working

Since the update yesterday, mobileOGs-pl-kyanite.py no longer works. Line 25 is split on 2 lines.

When I manually fix the split line the program runs but no longer returns the Alignment output file: *.mobileOG.Alignment.Out.csv

Error: target with no hsps

Hi Connor,

I ran the scripts in 2 different databases, and while it worked perfectly in one of them, in the other I got the following error:

Computing alignments... Error: generate_output: target with no hsps.
Empty diamond output. No hits returned from diamond search.

In that case (it is a phage-plasmid database) no output files were generated. Maybe should I try changing some parameters (currently I use -k 15 -e 1e-20 -p 90 -q 90 - although I would like to know which other flags are supported (like --id, etc.)

License?

Hello! Thank you for putting together this database, it is exactly what I need to try to improve the annotation of proteins on predicted MGEs. However, can you set up a license in the repo? Just to be on the safe side.

Thanks in advance!

Alejandra.

Output of mobileOG

Hi,

Thanks for the great database and the scripts provided.

I have a few general questions about the output files and recommendations on how to interpret them:

  1. First, would you recommend to use the full database or only the version containing manually curated + homologue sequences? Is the classification of the remaining proteins reliable (keyword data)? I have tried using your tool in a few contigs using the curated + homology DB and some proteins are given a NA as mobileOG Category (even in some manually curated sequences). Why does this happen?

  2. Also, I would like to have a more detailed explanation of the output files. E.g. contig_file_summary.csv: I do not completely understand the output of this file. I would expect that each row corresponds to one contig (although contig names are not displayed), but in my case there are less rows than contigs.

  3. From a list of potential phage/viral contigs, I am interested in determining which of these contigs could be potential plasmids and mobile elements to discard them, as I want to keep only phage sequences. Which annotations (or how many) should be present in a contig to confidently classify it as a mobile element or as a plasmid?

Thank you,
Asier

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.