Comments (10)
Hi,
my best guess would be that your FAI file does contain a chromosome that is not present in the input files (chromatin accessibility and motif files). When predicting, you may specify "prediction chromosomes" with parameter p, and if that is not present, the prediction chromosomes are internally set to all chromosome present in the FAI (in sorted order). The process then starts looking for the current chromosome by sequentially scanning all input files line by line and if that chromosome cannot be found at any line, the readers scanning those input files finally return "null", which might be the case, here.
An alternative explanation might be that all input files do not contain a single line (and the readers returning "null" right from the beginning), but as those files worked in the training step, I would consider this rather unlikely.
If that does not explain your issue, I would provide a Catchitt version with additional debugging output that could be posted here to localise the error.
Best,
Jan
from jstacs.
Thanks for the quick reply!
I have tried predicting using individual chromosomes and found chrY cause this error. But chrY is present in both Motif and Accessibility features.tsv (and in .fai file). I don't know the reason for my bug but I basically get what I need except for chrY, which is really not a problem for me.
Thanks,
Yichao
from jstacs.
Thanks for the swift reply and great that you found a way working around this issue.
However, I would appreciate if you could check that the IDs from "chrY" are exactly identical (including upper/lower case) in the FAI and feature files? If yes, there might be a bug on our side (although I currently don't see where this might come from), and it would be important for us to correct it.
Best,
Jan
from jstacs.
Yes, they are exactly the same, "chrY".
Thanks,
Yichao
from jstacs.
Would you mind posting your FAI file and the file output of
gunzip -c Chromatin_accessibility.tsv.gz | grep "chrY" | head -n 1000 | gzip > test_access.tsv.gz
of your accessiblity file and, in analogy,
gunzip -c Motif_scores.tsv.gz | grep "chrY" | head -n 1000 | gzip > test_motif.tsv.gz
for one of your motif files?
from jstacs.
Sorry for my late response.
[yli11@noderome105 fasta]$ cat mm9_main.fa.fai
chr1 197195432 6 50 51
chr10 129993255 201139354 50 51
chr11 121843856 333732482 50 51
chr12 121257530 458013223 50 51
chr13 120284312 581695911 50 51
chr14 125194864 704385917 50 51
chr15 103494974 832084686 50 51
chr16 98319150 937649567 50 51
chr17 95272651 1037935107 50 51
chr18 90772031 1135113219 50 51
chr19 61342430 1227700698 50 51
chr2 181748087 1290269983 50 51
chr3 159599783 1475653038 50 51
chr4 155630120 1638444823 50 51
chr5 152537259 1797187552 50 51
chr6 149517037 1952775563 50 51
chr7 152524553 2105282947 50 51
chr8 131738871 2260857998 50 51
chr9 124076172 2395231653 50 51
chrM 16299 2521789355 50 51
chrX 166650296 2521805986 50 51
chrY 15902555 2691789294 50 51
from jstacs.
The problem seems to occur if a) the chromosomes do not appear in alphabetical order in the motif/accessibility files and the FeatureReader needs to re-start to search for the chromosome from the beginning. Should be fixed with commit da23fb1
from jstacs.
Great! If you can provide a jar file, then I can quickly test it out.
I'm not good at Java, not sure how to get a jar file from .java file.
Thanks,
Yichao
from jstacs.
Please find the JAR at https://cloud.uzi.uni-halle.de/owncloud/index.php/s/hLgqnVomv3ZGzkR
If it works for you, I would make this a public bugfix release.
from jstacs.
It worked!
Thanks,
Yichao
from jstacs.
Related Issues (20)
- GeMoMa Extractor Error: There are gene annotations on chromosomes/contigs with missing reference sequence HOT 2
- java.lang.IllegalArgumentException: At least two sequences with the same ID but different sequence: HOT 5
- Using RNA-seq from closely-related organism HOT 2
- GeMoMa expected runtime HOT 2
- How does GeMoMa treat masked nucleotides? HOT 1
- Problem when adding external evidence HOT 5
- cdsParts=true, but the ID (gene-si: )seems to be no CDS part HOT 1
- AnnotationFinalizer for renaming genes and transcripts HOT 2
- Exception in thread "main" java.lang.NullPointerException HOT 2
- Why does GeMoma annotationfianlizer remove the exon features ? HOT 1
- Could not open GeMoMa_temp/GeMoMaPipeline-9364982901453846136/mmseqsdb_h.index.5 for writing! HOT 2
- java.lang.OutOfMemoryError: Java heap space when using GeMoma HOT 21
- java.lang.InterruptedException HOT 1
- Use GeMoma with protein as reference, the result showed no predicted_proteins.fasta and final_annotation.gff HOT 13
- Result filtering HOT 4
- GeMoMa error HOT 8
- No gene model was extracted from the references. HOT 3
- GeMoMa gives me multiple genes with exact same coordinates HOT 6
- Issues with CLI Analzyer HOT 5
- de.jstacs.data.WrongAlphabetException for gene HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from jstacs.