Giter Club home page Giter Club logo

raw-lab / mercat2 Goto Github PK

View Code? Open in Web Editor NEW
11.0 4.0 1.0 107.9 MB

MerCat2: python code for versatile k-mer counting and diversity estimation for database independent property analysis for metaome data

Home Page: https://github.com/raw-lab/mercat2/

License: BSD 3-Clause "New" or "Revised" License

HTML 99.61% Python 0.25% Shell 0.02% CSS 0.02% Jupyter Notebook 0.09%
python diversity plotly multiomics multiomics-data protein dask fastq k-mer-counting k-mer-frequency

mercat2's Introduction

Hi there ๐Ÿ‘‹

Welcome to the RAW Lab

  • ๐Ÿ”ญ We are currently working on Microbialites/Stromatolites for NASA, Nitrogenase for USDA, Viral-like particles for industry, and the bat immune system for NIH
  • ๐ŸŒฑ We currently learning Rust as a lab
  • ๐Ÿ‘ฏ We are looking to collaborate on anything related to viruses, bioinformatics, computational biology, or synthetic biology.
  • ๐Ÿค” We looking for help with variety of projects. Send Dr. RAW an email.
  • ๐Ÿ’ฌ Ask us about viruses, bioinformatics, computational biology, or synthetic biology.
  • ๐Ÿ“ซ How to reach to us: Dr. RAW
  • โšก Fun fact: There are more viruses on Earth then stars in the observable universe. More viruses in your mouth then stars in the milky way ๐ŸŒŒ.

GitHub Streak

RAW Lab GitHub stats

Top Langs

mercat2's People

Contributors

atred avatar decrevi avatar raw-lab avatar raw937 avatar rnmounika avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

binyun-z

mercat2's Issues

Lack of diversity report after the completion of the run

Hey,

I wanted to view some diversity metrics for my dataset but unfortunately, no file was generated containing the diversity metrics of the dataset. The program outputted a protein summary.txt file which contained numbers but with no description what the numbers meant and the html report wanted me download a .tsv file but I have not been able to successfully download the file. Any suggestion or help or clarification is welcome.

Thanks

How to find out the k-mers summary of protein fasta

Hello! Mercat2 is a useful software to find out k-mers and I'm thankful for its help. I have a question that how to find out the k-mers summary of protein fasta after running Mercat2 using faa file. I just found a tsv file with a column title named 'count' and I couldn't understand the meaning of it. Hope your answer!

Can't perform the analysis

Hello, your tool seems fantastic, but I can't use it properly.ย 
I have some metagenomic reads samples (forward and reverse) that I have merged to have just one fasta file for each sample.ย 
I wanted to calculate the alfa diversity with Mercat2 to compare the samples with a kmer length of 31.ย 
But it seems that my workstation can't afford the computational effort.ย 
For some samples, it works, but when it starts with a 2GB sample, it complains that a "worker died" and the analysis fails.ย 
Do you have any recommendations?ย 

IndexError: list index out of range

Hi,

I would like to analyze the similarity of a set of genome sequence (Vibrio stains) with mercat2's approach.
I installed mercat2 with mamba as instructed in the repository (using Python 3.10.14), and runned as follows.
I noticed the error happen only when multi sample (2โ‰ฅ) were supplied to the program.
How should I deal with this ?

best,

=============================================================================
$ cp GCF*fna test/
$ mercat2.py -f test/ -k 3 -n 8 -c 10
.
.
...
GCF_024442235.1 .fna
GCF_028743655.1 .fna
GCF_025917725.1 .fna
GCF_003691505.1 .fna
GCF_009665315.1 .fna
GCF_003691545.1 .fna
GCF_001310575.2 .fna
GCF_022453845.1 .fna
GCF_002887655.1 .fna
GCF_002021755.1 .fna
Time to load 16 files: 8.51 seconds
Checking for large nucleotide files
Processing Nucleotides
Running Mercat2 using 18 cores
Time to count 3-mers: 1.71 seconds

Creating Nucleotide Graphs
/home/kazu/mambaforge/envs/mercat2/lib/python3.10/site-packages/sklearn/metrics/pairwise.py:2317: DataConversionWarning:

Data was converted to boolean for metric dice

/home/kazu/mambaforge/envs/mercat2/lib/python3.10/site-packages/sklearn/metrics/pairwise.py:2317: DataConversionWarning:

Data was converted to boolean for metric jaccard

Error with beta metric: Mahalanobis
Data must be symmetric and cannot contain NaNs.
/home/kazu/mambaforge/envs/mercat2/lib/python3.10/site-packages/sklearn/metrics/pairwise.py:2317: DataConversionWarning:

Data was converted to boolean for metric matching

/home/kazu/mambaforge/envs/mercat2/lib/python3.10/site-packages/sklearn/metrics/pairwise.py:2317: DataConversionWarning:

Data was converted to boolean for metric rogerstanimoto

/home/kazu/mambaforge/envs/mercat2/lib/python3.10/site-packages/sklearn/metrics/pairwise.py:2317: DataConversionWarning:

Data was converted to boolean for metric russellrao

/home/kazu/mambaforge/envs/mercat2/lib/python3.10/site-packages/sklearn/metrics/pairwise.py:2317: DataConversionWarning:

Data was converted to boolean for metric sokalmichener

/home/kazu/mambaforge/envs/mercat2/lib/python3.10/site-packages/sklearn/metrics/pairwise.py:2317: DataConversionWarning:

Data was converted to boolean for metric sokalsneath

/home/kazu/mambaforge/envs/mercat2/lib/python3.10/site-packages/sklearn/metrics/pairwise.py:2317: DataConversionWarning:

Data was converted to boolean for metric yule

Gathering Diversity Metrics
Traceback (most recent call last):
File "/home/kazu/mambaforge/envs/mercat2/bin/mercat2.py", line 508, in
mercat_main()
File "/home/kazu/mambaforge/envs/mercat2/bin/mercat2.py", line 499, in mercat_main
mercat2_report.merge_tsv(tomerge, outfile)
File "/home/kazu/mambaforge/envs/mercat2/lib/python3.10/site-packages/mercat2_lib/mercat2_report.py", line 128, in merge_tsv
kmer = sorted(kmers)[0]
IndexError: list index out of range

Cannot import name 'ConstantInputWarning' from 'scipy.stats'

Hello, this tool looks great! But when running mercat2 using demo data, I encountered the following error:

File "/storage/home/hcoda1/1/kbian8/.conda/envs/mercat2/lib/python3.10/site-packages/skbio/stats/distance/_mantel.py", line 16, in
from scipy.stats import ConstantInputWarning
ImportError: cannot import name 'ConstantInputWarning' from 'scipy.stats' (/storage/home/hcoda1/1/kbian8/.conda/envs/mercat2/lib/python3.10/site-packages/scipy/stats/init.py)

SciPy has been updated to 1.8.1 and all dependencies have been installed as well as their version matched.

No idea how to solve this. May I have your suggestions?

ImportError: cannot import name 'ConstantInputWarning' from 'scipy.stats'

Error due to scikit-kbio:

ImportError: cannot import name 'ConstantInputWarning' from 'scipy.stats' (/home/smith/miniconda3/lib/python3.8/site-packages/scipy/stats/__init__.py)

This is likely related to recent installations (like numba) which might have modified versions of numpy and scipy, as previously the problem didn't occur.

Versions:
python : 3.8
scipy : 1.8.1
numpy : 1.22.4
skbio : 0.5.9

Related:
Error: cannot import name 'SpearmanRConstantInputWarning' from 'scipy.stats'
Cannot import name 'ConstantInputWarning' from 'scipy.stats' #4

Option for transposed table output

Thank you very much for this package, it's been a great help!

It would be very helpful to have an option to output transposed tables, with the kmers along the columns and the fasta files as rows. I'm currently using an awk script to transpose them, but when k and the number of genomes get larger, memory becomes an issue.

Difference between mercat and mercat2?

Hi

I was wondering what the main difference between mercat and mercat2 is. From the readme, it looks like the difference is the output, and the plots? Is that correct or are there any differences in the underlying methods.

thanks!

Failed to start dashboard / cannot import name 'packaging' from 'pkg_resources' / Unable to register worker with raylet. No such file or directory

I am trying to run mercat2 on SwissProt. In a little test invocation,

   mercat2.py -i uniprot_sprot.fasta -k1 -c1 -skipclean -o k1

I get the below errors.

I also tried starting a Ray cluster manually by

   ray start --head

but that gives me another error as below.

What am I doing wrong?

Many thanks,
David.

Error messages from mercat2 alone:

Starting MerCat2 v1.4.1 with k-mer 1 and 48 threads

2024-05-23 19:52:54,205 ERROR services.py:1207 -- Failed to start the dashboard , return code 1
2024-05-23 19:52:54,206 ERROR services.py:1232 -- Error should be written to 'dashboard.log' or 'dashboard.err'. We are printing the last 20 lines for you. See 'https://docs.ray.io/en/master/ray-observability/ray-logging.html#logging-directory-structure' to find where the log file is.
2024-05-23 19:52:54,206 ERROR services.py:1276 -- 
The last 20 lines of /tmp/ray/session_2024-05-23_19-52-52_585467_40696/logs/dashboard.log (it contains the error message from the dashboard): 
  File "/bi/home/dkreil/.conda/envs/mercat2/lib/python3.10/site-packages/ray/dashboard/dashboard.py", line 75, in run
    await self.dashboard_head.run()
  File "/bi/home/dkreil/.conda/envs/mercat2/lib/python3.10/site-packages/ray/dashboard/head.py", line 322, in run
    modules = self._load_modules(self._modules_to_load)
  File "/bi/home/dkreil/.conda/envs/mercat2/lib/python3.10/site-packages/ray/dashboard/head.py", line 219, in _load_modules
    head_cls_list = dashboard_utils.get_all_modules(DashboardHeadModule)
  File "/bi/home/dkreil/.conda/envs/mercat2/lib/python3.10/site-packages/ray/dashboard/utils.py", line 121, in get_all_modules
    importlib.import_module(name)
  File "/bi/home/dkreil/.conda/envs/mercat2/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/bi/home/dkreil/.conda/envs/mercat2/lib/python3.10/site-packages/ray/dashboard/modules/dashboard_sdk.py", line 10, in <module>
    from pkg_resources import packaging
ImportError: cannot import name 'packaging' from 'pkg_resources' (/bi/home/dkreil/.conda/envs/mercat2/lib/python3.10/site-packages/pkg_resources/__init__.py)
2024-05-23 19:52:54,252 INFO worker.py:1621 -- Started a local Ray instance.
[2024-05-23 19:52:54,936 E 40696 40696] core_worker.cc:201: Failed to register worker 01000000ffffffffffffffffffffffffffffffffffffffffffffffff to Raylet. IOError: [RayletClient] Unable to register worker with raylet. No such file or directory

Error messages from mercat2 after first starting a Ray head node:

Starting MerCat2 v1.4.1 with k-mer 1 and 48 threads

2024-05-23 20:23:58,024 INFO worker.py:1431 -- Connecting to existing Ray cluster at address: 141.244.140.16:6379...
Traceback (most recent call last):
  File "/bi/home/dkreil/.conda/envs/mercat2/bin/mercat2.py", line 508, in <module>
    mercat_main()
  File "/bi/home/dkreil/.conda/envs/mercat2/bin/mercat2.py", line 217, in mercat_main
    ray.init(num_cpus=m_num_cores, log_to_driver=DEBUG)
  File "/bi/home/dkreil/.conda/envs/mercat2/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
    return func(*args, **kwargs)
  File "/bi/home/dkreil/.conda/envs/mercat2/lib/python3.10/site-packages/ray/_private/worker.py", line 1523, in init
    raise ValueError(
ValueError: When connecting to an existing cluster, num_cpus and num_gpus must not be provided.

Recommendations

Hi @raw-lab @raw937

Thanks for your previous answer regarding the difference between mercat and mercat2. I was wondering if you have any recommendations on:

  1. Number of input reads to use (I have illumina reads and I've tried downsample 100k, 1M, 5M...etc)
  2. kmer length (I've used k=21, which depending on input can impact speed of running)
  3. I was thinking of using mercat2 to quickly calculate the shannon/simpsons index as a way to compare diversity between samples (running mercat individually on each sample). I was wondering if that can be used that way, as a way to compare between samples.

thanks again for your help!

Will

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.