Comments (12)
You can refer to #40. You have either 'N's or duplicated barcodes in your barcode whitelist. Removing them should resolve the problem.
I really should add this feature to remove the barcodes with 'N' and duplicated. I will try to get this in next release.
from chromap.
Hi, there are no N's or duplicates in my BC list - I've verified this programmatically
from chromap.
I have further narrowed down the error - the first 96 items in the list are ok -
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGAGATATCGAGGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGTTGTCCGCCAGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGTAGCTGTTGGGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGACCAGACGTAGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGCGTTATCGTGGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGATTGAACCACGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGTCATGTGAAGGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGCTGTAATCCAGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGAGGATCTATTGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGAGCGGATACAGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGTGCTTGCCGTGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGCAGTAAGGTCGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGACCTATCAGAGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGATCGGAACCAGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGCCATCTCTGTGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGGCTGACAACGGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGAGTTGGAAGAGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGCTCCTATCATGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGACTAATGACGGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGCACAGACAACGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGACGAGGATTGGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGGAACTTGTAGGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGTTCACATTCCGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGCATGTCTGCAGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGAGCTAACTACGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGGGTTAATCACGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGATCCAGTGCAGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGCTGGTTCTGAGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGGCCGACATTAGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGTTGAGTTCTCGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGACTAGCTCTCGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGGAGCGGTAACGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGCAGTGTGGAAGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGTAAGTGATCGGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGACTAGGCTTCGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGCTAATGTGTGGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGACGTATCCAAGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGGAACGTGCGTGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGAGGTTAGAGCGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGCATCGTCGTAGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGCATTCCAAGAGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGACTCTCTACGGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGACGAACCGACGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGTAGTTCTTGAGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGCAAGCACCGTGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGTAGTTGGACTGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGAACATTCAGGGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGCTATCTGCTTGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGTCTACGAACGGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGGAGGACAAGGGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGAATGCCGTGCGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGAAGAGGCTAAGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGGGTTAGCAGAGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGGGTGACTATAGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGGCAGACCAGAGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGTCCTGATAAGGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGACCTCGAATCGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGTAGATGCTCAGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGGAGGTACTACGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGCTTGAAGAAGGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGCTATTAGGCTGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGCTTGGATGCAGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGCTAGAAGGCTGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGGTATCAGCCGGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGGCTGTCCTAGGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGGATGGTTGTTGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGCTAACGGCTAGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGTAGTGTGTTGGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGGCATCAGAAGGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGGTGACGTTGAGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGGCAATGATTGGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGGACCAATCGAGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGAGGACGTTCGGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGGTGTATCAGAGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGTTCTAGGTCGGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGTAATCTGTGCGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGGTGCCAATAAGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGGATACAAGAGGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGCACTGGTGGTGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGGTGAGTAGCCGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGGTGTGAAGAAGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGACACAACGCCGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGCATTCCTAAGGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGAGGAGACTCAGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGCCAGTCAATCGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGGTTCCATGCTGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGTTCGCACAACGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGGCCATTGAGGGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGATGACTCCAAGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGAATGATGCGGGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGTGTCACAACAGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGTGGTCTACTCGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGTACCTTCGATGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGGATTAGAGAAGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGGAGCTAGTAAGT
When the 97th item is put in the whitelist - the hash check fails even though its unique
TGTAGCAAGTAGGGTACTCGCCGGTTACATGCAGTAGCTGTGACCGTACTGT
from chromap.
More specifically it looks like your key generator is clashing
BC-KEY 191644134523
BC-DATA TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGTGACCGTACTGT
HASH-CODE 1
BC-KEY 191644134523
BC-DATA TGTAGCAAGTAGGGTACTCGCCGGTTACATGCAGTAGCTGTGACCGTACTGT
HASH-CODE 0
from chromap.
The seed number seems to get bigger until position 20 and then tails off -
HERE 33
HERE 314
HERE 359
HERE 3236
HERE 3946
HERE 33785
HERE 315140
HERE 360560
HERE 3242242
HERE 3968971
HERE 33875884
HERE 315503538
HERE 362014154
HERE 3248056618
HERE 3992226475
HERE 33968905900
HERE 315875623601
HERE 363502494407
HERE 3254009977629
HERE 31016039910518
HERE 3765624758747
HERE 3863475779439
HERE 3155368234428
HERE 3621472937714
HERE 3286868495307
HERE 347962353455
HERE 3191849413822
HERE 3767397655290
HERE 3870567365608
HERE 3183734579105
HERE 3734938316422
HERE 3740730010137
HERE 3763896784996
HERE 3856563884434
HERE 3127720654411
HERE 3510882617644
HERE 3944018842802
HERE 3477540487881
HERE 3810650323751
HERE 31043578039454
HERE 3875777274491
HERE 3204574214638
HERE 3818296858552
HERE 31074164178657
HERE 3998121831301
HERE 3693952441878
HERE 3576786511963
HERE 3108122792300
HERE 3432491169201
HERE 3630453049031
HERE 3322788940574
HERE 3191644134523
BC-KEY 191644134523
BC-DATA TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGTGACCGTACTGT
HERE 33
HERE 314
HERE 359
HERE 3236
HERE 3946
HERE 33785
HERE 315140
HERE 360560
HERE 3242242
HERE 3968971
HERE 33875884
HERE 315503538
HERE 362014154
HERE 3248056618
HERE 3992226475
HERE 33968905900
HERE 315875623601
HERE 363502494407
HERE 3254009977629
HERE 31016039910518
HERE 3765624758745
HERE 3863475779429
HERE 3155368234390
HERE 3621472937562
HERE 3286868494699
HERE 347962351023
HERE 3191849404092
HERE 3767397616369
HERE 3870567209924
HERE 3183733956371
HERE 3734935825486
HERE 3740720046393
HERE 3763856930020
HERE 3856404464530
HERE 3127082974795
HERE 3508331899180
HERE 3933815968946
HERE 3436728992457
HERE 3647404342055
HERE 3390594112670
HERE 3462864822907
HERE 3751947663854
HERE 3808767399864
HERE 31036046343905
HERE 3845650492293
HERE 384067085846
HERE 3336268343387
HERE 3245561745772
HERE 3982246983089
HERE 3630453049031
HERE 3322788940574
HERE 3191644134523
BC-KEY 191644134523
BC-DATA TGTAGCAAGTAGGGTACTCGCCGGTTACATGCAGTAGCTGTGACCGTACTGT
HASH-CODE 0
from chromap.
From
inline uint64_t GenerateSeedFromSequenceAt(uint32_t sequence_index,
uint32_t start_position,
uint32_t seed_length) const {
const char *sequence = GetSequenceAt(sequence_index);
uint32_t sequence_length = GetSequenceLengthAt(sequence_index);
uint64_t mask = (((uint64_t)1) << (2 * seed_length)) - 1;
uint64_t seed = 0;
for (uint32_t i = 0; i < seed_length; ++i) {
if (start_position + i < sequence_length) {
uint8_t current_base =
SequenceBatch::CharToUint8(sequence[i + start_position]);
if (current_base < 4) { // not an ambiguous base
seed = ((seed << 2) | current_base) & mask; // forward k-mer
} else {
seed = (seed << 2) & mask; // N->A
}
} else {
seed = (seed << 2) & mask; // Pad A
}
}
return seed;
}
from chromap.
Thank you so much for investigating this issue! Actually I didn't expect that your barcode length is that long. For now, Chromap only supports barcode length up to 32bp. And yours is more than 50bp. Can I know what kind of sequencing assays or protocols this is?
from chromap.
Its a custom single-cell protocol that follows HyDrop - the barcode is 50bp because of the combinatorial barcode extensions during barcode bead production.
The barcode consists of three 10 base pair barcode segments with two repeated sequences in between. The actual barcode therefore is only 30bp long but to detect it from the index reads its 50bp.
Is there anyway I can get the software to work with it?
BC1-REPSEQ1-BC2-REPSEQ2-BC3
from chromap.
@mourisl can you comment here? Does the current "--read-format" option support this kind of input?
from chromap.
Thanks for the pinging. I don't think Chromap can handle non-consecutive barcodes or barcodes longer than 32bp. So you may need to preprocess the data to concatenate BC1-BC2-BC3.
In the future, we will come up with a way to specify the non-consecutive case, though it is likely that in the output it will be BC1-BC2-BC3 instead of BC1-REPSEQ1-BC2-REPSEQ2-BC3.
from chromap.
Thanks for your responses - this is indeed what i did yesterday and now the aligner has completed running with no errors
from chromap.
Following up on this (closed) issue: I am going to process some share-seq data which produce barcodes longer than 32bp by combinatorial indexing. I understand that non DNA characters are not allowed, so I cannot create a whitelist with arbitrary strings (representing the barcode sequences), is that correct?
from chromap.
Related Issues (20)
- An unknown error HOT 32
- [BUG] summary and log are confusing. HOT 6
- "Number of mapped reads" from log file HOT 3
- [Feature Request] report number of duplicated fragments in bulk HOT 4
- Different ValidPairs rate between chromap and bowtie2 in HiC data HOT 9
- how to keep multi-mapped paires for HiC data. HOT 1
- [BUG] output to /dev/stdout HOT 6
- Understanding the multi-mapping reads and whether they are part of the bed file HOT 2
- ATAC-seq single end? HOT 3
- Coordinate system of the output fragment file? HOT 1
- multi-mapped reads HOT 3
- [BUG] Manpage is down HOT 1
- [BUG] Support for combinatorial barcode indexing(like SHARE) not present HOT 3
- [BUG] chromap map Hi-C short reads Parameters: error threshold HOT 2
- [BUG]For HiC data, the size of SAM files outputted using Chromap is much smaller compared to those from BWA-MEM HOT 4
- Repetitive or low-quality barcode sequences in scATAC data HOT 1
- [BUG] possibly improper MD tag generation whej running atac data. HOT 3
- Mapping paired-end single-cell ATAC-Seq reads HOT 2
- why so slow? HOT 2
- Failure to load cellular barcodes containing Ns HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from chromap.