I am currently building something based on this project (and things are going quite well, thanks to good work). I noticed however that the cue_points()
method doesn't seem to work as expected with the queue points that a SoundDevices MixPre-series recorder produces when the cue point button is pressed.
Expectation
Running cue_points() should return a Result<Vec<Cue>, ParserError>
where Cue::frame
reflects the Cue's position.
Problem
In the case of the Sound Devices file it returns a list of Cues in the correct length, however the value of the frame
field is always zero.
Details
When I open the File in Izotope RX and save it works as expected. That prompts the question: What is actually different between the files?
Sound Devices
See the file here: sounddevices.zip
Chunk in Hex:
63 75 65 20 94 00 00 00 06 00 00 00 00 00 00 00 00 00 00 00 64 61 74 61 00 00 00 00 00 00 00 00 00 60 01 00 01 00 00 00 00 00 00 00 64 61 74 61 00 00 00 00 00 00 00 00 00 B0 02 00 02 00 00 00 00 00 00 00 64 61 74 61 00 00 00 00 00 00 00 00 00 A0 03 00 03 00 00 00 00 00 00 00 64 61 74 61 00 00 00 00 00 00 00 00 00 80 04 00 04 00 00 00 00 00 00 00 64 61 74 61 00 00 00 00 00 00 00 00 00 D0 05 00 05 00 00 00 00 00 00 00 64 61 74 61 00 00 00 00 00 00 00 00 00 E0 05 00
Izotope
Versus the file here: izotope.zip
Chunk in Hex:
63 75 65 20 94 00 00 00 06 00 00 00 01 00 00 00 00 60 01 00 64 61 74 61 00 00 00 00 00 00 00 00 00 60 01 00 02 00 00 00 00 B0 02 00 64 61 74 61 00 00 00 00 00 00 00 00 00 B0 02 00 03 00 00 00 00 A0 03 00 64 61 74 61 00 00 00 00 00 00 00 00 00 A0 03 00 04 00 00 00 00 80 04 00 64 61 74 61 00 00 00 00 00 00 00 00 00 80 04 00 05 00 00 00 00 D0 05 00 64 61 74 61 00 00 00 00 00 00 00 00 00 D0 05 00 06 00 00 00 00 E0 05 00 64 61 74 61 00 00 00 00 00 00 00 00 00 E0 05 00 4C 49 53 54 04 00 00 00 61 64 74 6C
Differences
The most obvious difference here is that the sounddevices file has no adtl chunk. I deleted that part in the izotope.wav with a hexeditor and the result still came out right, so I assume this plays no role.
I spotted two differences which might have more impact here (pictured here, the first cue point):
- Sound Devices starts counting with an
ID
of 0, iZotope with an ID
of 1
- Sound Devices leaves the value for
Position
at 0, iZotope fills it with the value both use in the Sample Offset
field
The technical documentation linked in your Docs defines Position
as:
The position specifies the sample offset associated with the cue point in terms of the sample's position in the final stream of samples generated by the play list. Said in another way, if a play list chunk is specified, the position value is equal to the sample number at which this cue point will occur during playback of the entire play list as defined by the play list's order. If no play list chunk is specified this value should be 0.
I can't find any plst
chunk in the data, so iZotopes implementation is probably at fault here (or I just don't know enough about the matter). iZotope RX itself can read the Sound Devices cue chunks however, otherwise it would not be able to save them in their format. Because reliably reading cue points when they are there is probably the better option over not reading them because they don't fit the precisely defined format.
Cross checking
ffprobe has an option to list cues as well. When I run ffprobe -i sounddevices.wav -print_format json -show_chapters -loglevel error
I get the following (expected) data:
{
"chapters": [
{
"id": 0,
"time_base": "1/44100",
"start": 90112,
"start_time": "2.043356",
"end": 176128,
"end_time": "3.993832"
},
{
"id": 1,
"time_base": "1/44100",
"start": 176128,
"start_time": "3.993832",
"end": 237568,
"end_time": "5.387029"
},
{
"id": 2,
"time_base": "1/44100",
"start": 237568,
"start_time": "5.387029",
"end": 294912,
"end_time": "6.687347"
},
{
"id": 3,
"time_base": "1/44100",
"start": 294912,
"start_time": "6.687347",
"end": 380928,
"end_time": "8.637823"
},
{
"id": 4,
"time_base": "1/44100",
"start": 380928,
"start_time": "8.637823",
"end": 385024,
"end_time": "8.730703"
},
{
"id": 5,
"time_base": "1/44100",
"start": 385024,
"start_time": "8.730703",
"end": 452932,
"end_time": "10.270567"
}
]
}
Fix
Maybe changing this in cue.rs this would already fix it?
It is likely that I am too optimistic here though
raw_cues.iter()
.map(|i| {
Cue {
//ident : i.cue_point_id,
frame : i.frame_offset, # <-- was i.frame,
length: {
raw_adtl.ltxt_for_cue_point(i.cue_point_id).first()
.filter(|x| x.purpose == FourCC::make(b"rgn "))
.map(|x| x.frame_length)
},
label: {
raw_adtl.labels_for_cue_point(i.cue_point_id).iter()
.map(|s| convert_to_cue_string(&s.text))
.next()
},
note : {
raw_adtl.notes_for_cue_point(i.cue_point_id).iter()
//.filter_map(|x| str::from_utf8(&x.text).ok())
.map(|s| convert_to_cue_string(&s.text))
.next()
}
}
}).collect()