Giter Club home page Giter Club logo

Comments (4)

al-mcintyre avatar al-mcintyre commented on August 28, 2024

Hi Eduardo,

Can you run traceback() after the error occurs and send the details? As far as I can tell, the chromosome names shouldn't be an issue unless there's a mismatch between your alignment/peak files and your gtf -- does the gtf also use "1" rather than "chr1"?

Thanks!

from deq.

edfajardo avatar edfajardo commented on August 28, 2024

Yes, that's the problem: the gtf file has chromosome ids with "chr" in it, as in chr1. The alignments just show "1" in the case of chr1. I am processing the new alignments, run with new indices to correct the chromosome names, and see what happens. I'll let you know at the end of the day. Our day, that is :)

Thanks!

from deq.

edfajardo avatar edfajardo commented on August 28, 2024

OK, problem solved. The program ran to completion with no errors. Let me give some details that might help somebody else. This experiment was done in mouse, and I did the alignments with HISAT2. Instead of making my own indices, I download the indices prepared by the John Hopkins people (this is where the program comes from). Initially I downloaded the grcm38_tran indices (https://cloud.biohpc.swmed.edu/index.php/s/grcm38_tran/download). As I later found out, after getting the errors described above, the chromosome ids in these indices don't follow the standard format, chrXX, where XX is the chromosome number (or X or Y). Instead, the "chr" part is dropped. To run DEQ, you have to provide a gtf file for the appropriate genome. I used the ensembl file for mouse, mm10.ensGene.gtf (the refseq gtf, mm10.refGene.gtf, does not work). The chromosome ids in the gtf files are in the standard format, i.e., chrXX, as details above. So when DEQ tries to map the peaks to the appropriate region of the genome, it can't because the peak bedfiles will not have "chr" in the chromosome ids for the peak coordinates. This is what causes the error.

The solution was to make my own indices from the mm10 genomic sequences. There are other pre-made indices available (ftp://ftp.ccb.jhu.edu/pub/infphilo/hisat2/data/mm10.tar.gz) but I have not tried them. The human hg38 (https://cloud.biohpc.swmed.edu/index.php/s/hg38_tran/download) or the rat rn6 indices (ftp://ftp.ccb.jhu.edu/pub/infphilo/hisat2/data/rn6.tar.gz) don't have this problem.

I know this is a long message but I wanted to provide as much detail as possible. Thank you for your help.

Eduardo

from deq.

al-mcintyre avatar al-mcintyre commented on August 28, 2024

Thanks Eduardo!

from deq.

Related Issues (3)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.