My aim is to compute entropy of all of the unmapped reads for each patient sample of whole genome sequencing data and remove low-complexity sequences before doing a microbiome analysis. Filtering works for the smaller samples, but segmentation faults on the larger ones. I use R 4.1.0 and Biostrings 2.60.1. Ihave narrowed the crashing down to trinucleotideFrequency
.
> testCase <- readFastq("OSCC_1-N_unmapped_all.fastq.gz")
testCase
> testCase
class: ShortReadQ
length: 48324630 reads; width: 150 cycles
> trinucleotideFrequency(sread(testCase))
*** caught segfault ***
address 0x7fa808276fb8, cause 'memory not mapped'
Traceback:
1: .Call2("XStringSet_oligo_frequency", x, width, step, as.prob, as.array, fast.moving.side, with.labels, simplify.as, base_codes, PACKAGE = "Biostrings")
2: .local(x, width, step, as.prob, as.array, fast.moving.side, with.labels, ...)
3: oligonucleotideFrequency(x, 3L, step = step, as.prob = as.prob, as.array = as.array, fast.moving.side = fast.moving.side, with.labels = with.labels, ...)
4: oligonucleotideFrequency(x, 3L, step = step, as.prob = as.prob, as.array = as.array, fast.moving.side = fast.moving.side, with.labels = with.labels, ...)
5: trinucleotideFrequency(sread(testCase))