Giter Club home page Giter Club logo

Comments (8)

asgray avatar asgray commented on August 11, 2024

Hi, can you provide the command you used to invoke RepeatClassifier? The first thing I want to confirm is that the configured library is being mounted into the container properly.

Another possible problem is that the automatic update script only extracts the curated families from the .h5 files. It's possible that it's just not including the families that would be useful to you, and if that's the case I can help work around that.

from tetools.

matthew-ackerman avatar matthew-ackerman commented on August 11, 2024

from tetools.

matthew-ackerman avatar matthew-ackerman commented on August 11, 2024

from tetools.

asgray avatar asgray commented on August 11, 2024

A read-only file system could certainly be causing issues. tetoolsDfamUpdate.pl rewrites /opt/RepeatMasker/Libraries/RepeatMasker.lib, so you could confirm if that's working by checking the timestamp with ls -al.
Can you share the error message when you use the shell -w command?

from tetools.

matthew-ackerman avatar matthew-ackerman commented on August 11, 2024

from tetools.

asgray avatar asgray commented on August 11, 2024

Thanks for confirming that the container is working. It looks like the issue is that tetoolsDfamUpdate.pl is only importing curated families into RepeatMasker.lib, and since there are no curated Daphnia families, it makes sense that you're not seeing any additional hits. To work around this, one solution is to use famdb.py to directly query the data you need.

First, I ran a few lineage queries to confirm what is in the FamDB files:

python3 famdb.py -i ./dfam38_full lineage -d daphnia
6668 Daphnia(8) [0]
├─6669 Daphnia pulex(8) [0]
└─35523 Daphnia pulicaria(8) [937]
python3 famdb.py -i ./dfam38_full lineage -d --curated daphnia
6668 Daphnia(8) [0]
├─6669 Daphnia pulex(8) [0]
└─35523 Daphnia pulicaria(8) [0]

This confirms that there are 937 uncurated families in file 8, though all are from Daphnia pulicaria. To extract them, you can use the families -f fasta_name command, and append them to your RepeatMasker.lib. The following command should generate the FASTA output from FamDB, prepend a new line to it, and concatenate it to RepeatMasker.lib.

python3 <host path to /RepeatMasker>/famdb.py -i <host path to /RepeatMasker>/Libraries/famdb families -d -f fasta_name daphnia | (echo && cat) >> <host path to /RepeatMasker>/Libraries/RepeatMasker.lib

More information regarding FamDB commands can be found here. If you need to add other FamDB files, just be aware that tetoolsDfamUpdate.pl will overwrite any changes you make to RepeatMasker.lib. You may choose to run tetoolsDfamUpdate.pl anyway, but if not you might need to edit rmlib.config by hand. If tetoolsDfamUpdate.pl is rerun, be sure to only export uncurated families from FamDB to avoid doubling up curated families in RepeatMasker.lib.

Hopefully that produces better results for you, but let me know either way.

from tetools.

matthew-ackerman avatar matthew-ackerman commented on August 11, 2024

from tetools.

asgray avatar asgray commented on August 11, 2024

from tetools.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.