Giter Club home page Giter Club logo

Comments (2)

MrOlm avatar MrOlm commented on August 20, 2024

Wow this is bizarre- never encountered anything like this before. It does seem like it could be a filesystem problem, but I've run with 10,000 genomes and it's been fine, so if so it would be somewhat specific to your machine. A couple of thoughts:

  1. you could try a different algorithm (for example --S_algorithm ANIn) is similar to the one you're using but doesn't have a filtering step (which seems to be where this is messing up). As a result it also only makes 1 file instead of 2. ANImf is generally better, but honestly they both give extremely similar results in most cases so I wouldn't be worried about it.

You could also try raising the -pa threshold to 0.95 or something, so that you end up doing less genomic comparisons and thus making less files?

  1. you could try and run with --debug, which would tell you exactly the error that nucmer is crashing with, but would make way more files. The fact that this works on subsets of the the genome list makes me hesitant to think that this is a problem with some genome, though (which is what --debug would tell you)

  2. Figuring out if it's the same genome that its crashing on every time, or if it crashes after some number of comparisons would be helpful to pointpointing the problem. If it's a filesystem error, it could be too many files in one directory (each genome has its own folder storing these .delta file, so this could happen if you end up with a giant primary cluster). Maybe try and make another couple files in the directory '/hps/nobackup2/production/metagenomics/clustering/iter_8/drep32/data/ANImf_files/ERS608499_12.fa/ and see if you get some sort of error?

  3. I got an email that it seemed to be a problem with a file ending with .f.delta instead of .fa.delta, but that isn't showing up here. Was that comment deleted or is it still a clue we should consider?

Best,
-Matt

from drep.

alexmsalmeida avatar alexmsalmeida commented on August 20, 2024

Hi Matt,

Thanks for the detailed response. It most likely is a random filesystem issue as just isolating those two genomes that were giving me issues does not result in the same error (I can de-replicate them without a problem).

I will try your additional suggestion of using the ANIn algorithm to see if it goes through. I am also re-running it again with --debug to see if I can spot anything else.

It doesn't seem to be related to too many files in one directory (that particular one has ~1000 files, which is under the recommended limit in our filesystem of 5000 files). I am also able create additional files in that same dir without a problem.

I deleted that last comment because I realized it was a copy and paste error, so sorry about that.

Will try to figure it out, thanks for your help.

Alex

from drep.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.