Giter Club home page Giter Club logo

Comments (12)

haowenz avatar haowenz commented on August 24, 2024

You can refer to #40. You have either 'N's or duplicated barcodes in your barcode whitelist. Removing them should resolve the problem.

I really should add this feature to remove the barcodes with 'N' and duplicated. I will try to get this in next release.

from chromap.

chris-cheshire avatar chris-cheshire commented on August 24, 2024

Hi, there are no N's or duplicates in my BC list - I've verified this programmatically

from chromap.

chris-cheshire avatar chris-cheshire commented on August 24, 2024

I have further narrowed down the error - the first 96 items in the list are ok -

TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGAGATATCGAGGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGTTGTCCGCCAGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGTAGCTGTTGGGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGACCAGACGTAGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGCGTTATCGTGGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGATTGAACCACGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGTCATGTGAAGGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGCTGTAATCCAGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGAGGATCTATTGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGAGCGGATACAGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGTGCTTGCCGTGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGCAGTAAGGTCGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGACCTATCAGAGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGATCGGAACCAGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGCCATCTCTGTGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGGCTGACAACGGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGAGTTGGAAGAGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGCTCCTATCATGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGACTAATGACGGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGCACAGACAACGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGACGAGGATTGGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGGAACTTGTAGGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGTTCACATTCCGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGCATGTCTGCAGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGAGCTAACTACGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGGGTTAATCACGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGATCCAGTGCAGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGCTGGTTCTGAGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGGCCGACATTAGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGTTGAGTTCTCGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGACTAGCTCTCGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGGAGCGGTAACGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGCAGTGTGGAAGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGTAAGTGATCGGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGACTAGGCTTCGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGCTAATGTGTGGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGACGTATCCAAGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGGAACGTGCGTGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGAGGTTAGAGCGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGCATCGTCGTAGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGCATTCCAAGAGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGACTCTCTACGGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGACGAACCGACGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGTAGTTCTTGAGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGCAAGCACCGTGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGTAGTTGGACTGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGAACATTCAGGGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGCTATCTGCTTGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGTCTACGAACGGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGGAGGACAAGGGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGAATGCCGTGCGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGAAGAGGCTAAGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGGGTTAGCAGAGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGGGTGACTATAGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGGCAGACCAGAGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGTCCTGATAAGGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGACCTCGAATCGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGTAGATGCTCAGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGGAGGTACTACGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGCTTGAAGAAGGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGCTATTAGGCTGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGCTTGGATGCAGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGCTAGAAGGCTGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGGTATCAGCCGGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGGCTGTCCTAGGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGGATGGTTGTTGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGCTAACGGCTAGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGTAGTGTGTTGGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGGCATCAGAAGGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGGTGACGTTGAGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGGCAATGATTGGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGGACCAATCGAGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGAGGACGTTCGGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGGTGTATCAGAGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGTTCTAGGTCGGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGTAATCTGTGCGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGGTGCCAATAAGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGGATACAAGAGGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGCACTGGTGGTGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGGTGAGTAGCCGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGGTGTGAAGAAGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGACACAACGCCGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGCATTCCTAAGGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGAGGAGACTCAGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGCCAGTCAATCGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGGTTCCATGCTGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGTTCGCACAACGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGGCCATTGAGGGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGATGACTCCAAGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGAATGATGCGGGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGTGTCACAACAGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGTGGTCTACTCGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGTACCTTCGATGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGGATTAGAGAAGT
TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGGAGCTAGTAAGT

When the 97th item is put in the whitelist - the hash check fails even though its unique

TGTAGCAAGTAGGGTACTCGCCGGTTACATGCAGTAGCTGTGACCGTACTGT

from chromap.

chris-cheshire avatar chris-cheshire commented on August 24, 2024

More specifically it looks like your key generator is clashing

BC-KEY 191644134523
BC-DATA TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGTGACCGTACTGT
HASH-CODE 1
BC-KEY 191644134523
BC-DATA TGTAGCAAGTAGGGTACTCGCCGGTTACATGCAGTAGCTGTGACCGTACTGT
HASH-CODE 0

from chromap.

chris-cheshire avatar chris-cheshire commented on August 24, 2024

The seed number seems to get bigger until position 20 and then tails off -

HERE 33
HERE 314
HERE 359
HERE 3236
HERE 3946
HERE 33785
HERE 315140
HERE 360560
HERE 3242242
HERE 3968971
HERE 33875884
HERE 315503538
HERE 362014154
HERE 3248056618
HERE 3992226475
HERE 33968905900
HERE 315875623601
HERE 363502494407
HERE 3254009977629
HERE 31016039910518
HERE 3765624758747
HERE 3863475779439
HERE 3155368234428
HERE 3621472937714
HERE 3286868495307
HERE 347962353455
HERE 3191849413822
HERE 3767397655290
HERE 3870567365608
HERE 3183734579105
HERE 3734938316422
HERE 3740730010137
HERE 3763896784996
HERE 3856563884434
HERE 3127720654411
HERE 3510882617644
HERE 3944018842802
HERE 3477540487881
HERE 3810650323751
HERE 31043578039454
HERE 3875777274491
HERE 3204574214638
HERE 3818296858552
HERE 31074164178657
HERE 3998121831301
HERE 3693952441878
HERE 3576786511963
HERE 3108122792300
HERE 3432491169201
HERE 3630453049031
HERE 3322788940574
HERE 3191644134523
BC-KEY 191644134523
BC-DATA TGTAGCAAGTAGGGTACTCGTTAGTTGGACGCAGTAGCTGTGACCGTACTGT
HERE 33
HERE 314
HERE 359
HERE 3236
HERE 3946
HERE 33785
HERE 315140
HERE 360560
HERE 3242242
HERE 3968971
HERE 33875884
HERE 315503538
HERE 362014154
HERE 3248056618
HERE 3992226475
HERE 33968905900
HERE 315875623601
HERE 363502494407
HERE 3254009977629
HERE 31016039910518
HERE 3765624758745
HERE 3863475779429
HERE 3155368234390
HERE 3621472937562
HERE 3286868494699
HERE 347962351023
HERE 3191849404092
HERE 3767397616369
HERE 3870567209924
HERE 3183733956371
HERE 3734935825486
HERE 3740720046393
HERE 3763856930020
HERE 3856404464530
HERE 3127082974795
HERE 3508331899180
HERE 3933815968946
HERE 3436728992457
HERE 3647404342055
HERE 3390594112670
HERE 3462864822907
HERE 3751947663854
HERE 3808767399864
HERE 31036046343905
HERE 3845650492293
HERE 384067085846
HERE 3336268343387
HERE 3245561745772
HERE 3982246983089
HERE 3630453049031
HERE 3322788940574
HERE 3191644134523
BC-KEY 191644134523
BC-DATA TGTAGCAAGTAGGGTACTCGCCGGTTACATGCAGTAGCTGTGACCGTACTGT
HASH-CODE 0

from chromap.

chris-cheshire avatar chris-cheshire commented on August 24, 2024

From

  inline uint64_t GenerateSeedFromSequenceAt(uint32_t sequence_index,
                                             uint32_t start_position,
                                             uint32_t seed_length) const {
    const char *sequence = GetSequenceAt(sequence_index);
    uint32_t sequence_length = GetSequenceLengthAt(sequence_index);
    uint64_t mask = (((uint64_t)1) << (2 * seed_length)) - 1;
    uint64_t seed = 0;
    for (uint32_t i = 0; i < seed_length; ++i) {
      if (start_position + i < sequence_length) {
        uint8_t current_base =
            SequenceBatch::CharToUint8(sequence[i + start_position]);
        if (current_base < 4) {                        // not an ambiguous base
          seed = ((seed << 2) | current_base) & mask;  // forward k-mer
        } else {
          seed = (seed << 2) & mask;  // N->A
        }
      } else {
        seed = (seed << 2) & mask;  // Pad A
      }
    }
    return seed;
  }

from chromap.

haowenz avatar haowenz commented on August 24, 2024

Thank you so much for investigating this issue! Actually I didn't expect that your barcode length is that long. For now, Chromap only supports barcode length up to 32bp. And yours is more than 50bp. Can I know what kind of sequencing assays or protocols this is?

from chromap.

chris-cheshire avatar chris-cheshire commented on August 24, 2024

Its a custom single-cell protocol that follows HyDrop - the barcode is 50bp because of the combinatorial barcode extensions during barcode bead production.

The barcode consists of three 10 base pair barcode segments with two repeated sequences in between. The actual barcode therefore is only 30bp long but to detect it from the index reads its 50bp.

Is there anyway I can get the software to work with it?

BC1-REPSEQ1-BC2-REPSEQ2-BC3

from chromap.

haowenz avatar haowenz commented on August 24, 2024

@mourisl can you comment here? Does the current "--read-format" option support this kind of input?

from chromap.

mourisl avatar mourisl commented on August 24, 2024

Thanks for the pinging. I don't think Chromap can handle non-consecutive barcodes or barcodes longer than 32bp. So you may need to preprocess the data to concatenate BC1-BC2-BC3.

In the future, we will come up with a way to specify the non-consecutive case, though it is likely that in the output it will be BC1-BC2-BC3 instead of BC1-REPSEQ1-BC2-REPSEQ2-BC3.

from chromap.

chris-cheshire avatar chris-cheshire commented on August 24, 2024

Thanks for your responses - this is indeed what i did yesterday and now the aligner has completed running with no errors

from chromap.

dawe avatar dawe commented on August 24, 2024

Following up on this (closed) issue: I am going to process some share-seq data which produce barcodes longer than 32bp by combinatorial indexing. I understand that non DNA characters are not allowed, so I cannot create a whitelist with arbitrary strings (representing the barcode sequences), is that correct?

from chromap.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.