Comments (12)
Hi @Rafael-Cast thanks for reporting this issue.
We'll look into this right away.
Kind regards,
Rich
from pod5-file-format.
Hi @Rafael-Cast ,
I've had a look internally and can't reproduce trivially by writing > 1000 reads.
Can you provide more info about how you are calling the function to add reads?
Thanks,
- George
from pod5-file-format.
Hi, thanks for looking into the issue.
I'm reading a POD5 file and copying it with a different compression method (either uncompressed or VBZ as requested by input). I could provide the full source code, but it's long (~400 lines) and not really the most tidy. Nonetheless, if you think it'd help I got no problem providing it.
Here is the relevant snippet:
for (size_t current_batch_idx = 0; current_batch_idx < batch_count; current_batch_idx++)
{
Pod5ReadRecordBatch_t *current_batch;
LOG_PROGRAM_ERROR(pod5_get_read_batch(¤t_batch, reader, current_batch_idx))
size_t batch_row_count;
LOG_PROGRAM_ERROR(pod5_get_read_batch_row_count(&batch_row_count, current_batch))
ReadBatchRowInfo_t read_record_batch_array[batch_row_count]; // For large batches this could cause a stack overflow. Should push data to heap
size_t sample_count[batch_row_count];
int16_t *signal[batch_row_count];
for (size_t current_batch_row = 0; current_batch_row < batch_row_count;
current_batch_row++)
{
// Load ReadBatchRowInfo to memory
uint16_t read_table_version;
LOG_PROGRAM_ERROR(pod5_get_read_batch_row_info_data(
current_batch, current_batch_row, READ_BATCH_ROW_INFO_VERSION,
&read_record_batch_array[current_batch_row], &read_table_version));
LOG_PROGRAM_ERROR(pod5_get_read_complete_sample_count(
reader, current_batch, current_batch_row,
&sample_count[current_batch_row]))
signal[current_batch_row] =
(int16_t *)malloc(sizeof(int16_t) * sample_count[current_batch_row]);
LOG_PROGRAM_ERROR(pod5_get_read_complete_signal(
reader, current_batch, current_batch_row,
sample_count[current_batch_row], signal[current_batch_row]))
}
uint32_t signal_length[batch_row_count];
for (size_t i = 0; i < batch_row_count; i++)
{
signal_length[i] = sample_count[i];
}
static ReadBatchRowInfoArray_t flattened_array;
transform_read_data_batch_array(read_record_batch_array, batch_row_count, current_batch, writer, &flattened_array);
LOG_PROGRAM_ERROR(pod5_add_reads_data(
writer, batch_row_count, READ_BATCH_ROW_INFO_VERSION, &flattened_array,
const_cast<const int16_t **>(signal), signal_length))
free_batch_array(&flattened_array);
for (size_t i = 0; i < batch_row_count; i++)
{
free(signal[i]);
}
LOG_PROGRAM_ERROR(pod5_free_read_batch(current_batch))
}
Here:
uint32_t signal_length[batch_row_count];
for (size_t i = 0; i < batch_row_count; i++)
{
signal_length[i] = sample_count[i];
}
Is used to cast the array type (it might not be necessary, but shouldn't be part of the problem)
And:
static ReadBatchRowInfoArray_t flattened_array;
transform_read_data_batch_array(read_record_batch_array, batch_row_count, current_batch, writer, &flattened_array);
Flattens the array of ReadBatchRowInfo_t to ReadBatchRowInfoArray_t as requested by pod5_add_reads_data (I couldn't find a way to diretly feed what's been read by pod5_get_read_batch_row_info_data into what pod5_add_reads_data expects without flattening).
By the way, the produced files are reported as consistent by the python script given in check_pod5_files_equal.py when the program doesn't crash by this assert. Is this the "intended" use of said script? I'm using it as a kind of observational equivalence test.
Thanks,
Rafael.
from pod5-file-format.
I assume you are adding pore types and run info's in transform_read_data_batch_array
, rather than just reusing the bare integer values?
Do you have a gdb core dump (and built executable) I could have a look at? Or better, a full buildable project i can poke at?
Thanks,
- George
from pod5-file-format.
I'm adding both pore types and run infos in transform_read_data_batch_array.
The buildable project is located in https://github.com/Rafael-Cast/pod5-file-format-debug.git under branch "debug"
This branch contains both the executable and the data, and is a fork from your project which I added under "examples" the code. You shouldn't need any more dependencies than what are included in your project.
To compile from scratch simply run "bash install.sh" and then "bash run.sh" will execute the failing test case.
I'm using gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-16) to compile the project.
Thanks,
Rafael.
from pod5-file-format.
Hi @Rafael-Cast ,
I get an error about a missing input file: ".../pod5-file-format-debug/batch12_new.pod5" I can't see it in the repo?
Thanks,
- George
from pod5-file-format.
Reviewing the code though, This looks a bit dodgy:
https://github.com/Rafael-Cast/pod5-file-format-debug/blob/debug/c%2B%2B/examples/copy.cpp#L200
> run_info_id[i] = in_data[i].run_info;
It copies the run info id from the source file to the dest file, but none of the run info data.
This could cause the issue - although the error should be better.
edit: running a test locally gives me the same call stack as you get - ill get an better error message in.
Thanks,
- George
from pod5-file-format.
I forgot to add the data sample which crashes. I've added it now to the repo.
from pod5-file-format.
When copying the read data with pod5_add_reads_data, only the run info id is required, that's why I'm only copying the ID on that line.
The run info is copied after copying the read data. Is this correct or the read data should be copied first?
On the other hand, if this line is switched (https://github.com/Rafael-Cast/pod5-file-format-debug/blob/070836d28828c5956f312216a6a61ec61c8627e5/c%2B%2B/examples/copy.cpp#L357):
from:
const Pod5WriterOptions_t writer_options = {0, comp_opt, 0, 0};
to:
const Pod5WriterOptions_t writer_options = {0, comp_opt, 0, 10000};
On a version previous to 0.2.0 solves the issue.
Nonetheless I think the problem is that I'm just assuming something which is not true from your API. I'll later try revising the line you suggested, writing the run info first and then the read data and try again.
This does seem to be an issue with my code. I'm sorry for (probably) reporting an nonexistent bug.
edit: Markdown to show code
Thanks,
Rafael.
from pod5-file-format.
Yes - if you add the run info data first the issue will go away - but as you say, the API should make it clear this order is unsupported with an error, not crash.
I'll get that new error in asap.
- George
from pod5-file-format.
Hi @Rafael-Cast ,
0.2.2
is now live, it should return an error when adding a read with an invalid run info id.
Hope that helps!
- George
from pod5-file-format.
Hi @jorj1988,
Thanks for the help and update!
Rafael
from pod5-file-format.
Related Issues (20)
- pod5 view does not work for some data since version 0.3.0 HOT 6
- pod5 webserver memory error HOT 2
- option to split pod5 by size/read number HOT 3
- Scratch/tmp pod5 problem HOT 21
- Semaphore hissy fit at the end of subset run HOT 1
- pod5 subset/filter in preparation for dorado duplex is slow HOT 5
- error with pod5 convert to_fast5 HOT 1
- Cannot install pod5 through pip on ARM due to dependency issues HOT 11
- Reader class attributes immutable (Cannot edit "sample_id" field of mutable read object) HOT 1
- getrandom error with pod5 convert fast5 HOT 14
- MantaControl': Unable to read fast5 file at /path/: HDF5 exception", HOT 2
- Getting the signal chunk size of a pod5 file HOT 1
- Missing conda pod5 package HOT 2
- No documentation regarding multi-file pod5 dependency HOT 2
- pod5 convert fast5 warning: Failed to read key read_XXX HOT 2
- Troubleshooting Conversion of Fast5 Files to Pod5 Format HOT 12
- error:XX.fast5 is not a multi-read fast5 file HOT 2
- pod5 filter get killed HOT 5
- pod5 convert fast5 is stalling HOT 4
- Split Read IDs Cause Missing Read Error? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pod5-file-format.